models can help banks capture more value
By McKinsey’s Piotr
Kaminski and Kate Robu
January 23, 2017
learning (ML) methods have been around for ages, but the big-data
revolution and the plummeting cost of computing power are now making
them truly excellent and practical analytical tools in banking across a
variety of use cases, including credit risk.
ML algorithms may sound complex and futuristic, but the way they work is
quite simple. Essentially they combine a massive set of decision trees
(i.e., a decision-making model that breaks out individual decisions and
possible consequences, also known as “learners”) to create an accurate
model. By churning through these learners at high speeds, ML models are
able to find “hidden” patterns, particularly in unstructured data that
common statistical tools miss.
Overfitting (the analytical description of random errors rather than
underlying relationships) of the model is a typical concern about ML.
Overfitting of ML models can be avoided by carefully choosing input
variables and specific algorithms. One way to guard against overfitting
is to use the popular Random Forest algorithm. This is an ensemble of
many intentionally “weakened” decision trees, essentially a partial set
of variables with each iteration of the model, thereby reducing the
reliance on specific variables. In another example, ML model performance
is also tested on a holdout sample not used during the model-development
process. If the model performance on the sample is significantly
degraded, it’s a sign of overfitting.
Where ML is superb is in analyzing long-tail data, which typically
account for half of a bank’s portfolio but are not well understood
through traditional statistical methods. Think of accounts with low
share of wallet. We usually know little about them, and strategies to
engage them tend to be quite reactive. But ML has the ability to
generate insights into their behaviors to actively target the accounts
that are potentially profitable.
Let’s take as an example an ML project focused on optimizing line
decisions in credit cards. The company was seeking to optimize
credit-line decisions for their cards business; that is, they wanted to
make better decisions about where to increase and decrease credit lines.
The existing models were performing and already had a very respectable
predictive power. We used the existing traditional account data and set
up our ML model as a challenger to the existing credit-line strategies.
We also accounted for all the policy-mandated eligibility constraints in
Still, the ML model (which used Random Forest and AdaBoost) outperformed
dramatically, improving the predictive power of the model by a factor of
1.6. This improvement can translate into significant increased revenue
from the less risky accounts that are based on existing models. These
would get a credit-line decrease and help avoid losses from the accounts
that are given credit-line increases but subsequently are most likely to
So what prevents banks from adopting ML tools more broadly? Typically,
there are three key concerns. First, the scale of variables would tax
the bank’s current capacity-constrained systems. Second is the issue of
compliance, i.e., an ML model is a black box that makes it hard to
explain the outcomes and ensure compliance with such regulations as
adverse action. Finally, model risk validation can be challenging given
the increased complexity and requires a different set of validation
techniques / approaches from those commonly used by the industry today.
the broader industry and regulatory bodies are still getting up to speed
on the application of ML models, there are practical ways to address
these three concerns in the near term. First, start modeling with all
the available variables (e.g., 100+), but quickly prioritize them based
on their contribution to the model, leaving a manageable number (e.g.,
30–40) that won’t sacrifice the model’s predictive power. Second, “prune
the branches” of the ML decision tree to get to a set of core linear
rules that use an even smaller number of variables (e.g., 5–12) while
still retaining 70 to 80 percent of the original ML model’s predictive
power. This approach delivers a simplified set of new ‘decision
strategies’ that banks can deploy quickly on top of existing rules, thus
assuring compliance with regulatory requirements while making minimal
changes to existing systems.
Is it possible to capture more value with a more sophisticated ML model?
Yes, and that’s most certainly the future. But this approach can help
banks start capturing value from ML immediately by addressing regulatory
and system constraints and making the best use of readily available
‘small’ data. The key implication for banks is that their current models
are leaving a lot of value on the table, and ML offers a way to capture
it in a practical way.