Validating Commercial Risk Grade Mappings: Why and How
Eric Falkenstein
Moody's Risk Management Services
Reprinted from the Journal of Lending and Credit Risk Management,
February 2000
Equation Section 1
Not too long ago, commercial loan grades consisted of one "pass" and four "criticized" grades. Regulators and credit committees focused relative risk among loan grades not relevant to origination. Now that RAROC (risk-adjusted return on capital) and economic capital allocations are all the rage, lenders are very interested in trying to institute more rigor and granularity of pass grades is validation, which implies an estimate derived from empirical documentation. Policy manuals and target default rates are well and fine, but prudent risk managers know better than to take anything at face value.
In contrast to the emphasis on elegant Markowitzian portfolio models at conferences and in journals, validation of internal risk grades is the key to measuring the riskiness of bank portfolios. Given a choice between knowing how to model a portfolio’s variability assuming expected loss and correlation data, or knowing the expected loss by internal grade, most bankers, especially the line management, would choose to know the expected loss. The problem with commercial loans is that losses are so infrequent it is very dificutl to validate solely using internal data. Instead, one is often faced with the unexciting task of setting up a system so that validation can occur in 3-plus years. This efforts, especially in periods where memories of losses are faded, are usually not met with great enthusiasm. Validation still can occur regardless of the state of a bank’s systems, although more information makes the process easier and more precise.
In the best scenario, a lender has comprehensive, archival, transaction-level information over a business cycle, with knowledge of who went into default or was written down, at what date, and the eventual loss given default for each exposure. This is all tied back to internal grade and various other obligor and facility information, and, to extend it fully, is tied to promotional programs offered in the region at the time of origination. With this information, straightforward tests can be used to calibrate and validate an internal ratings system. Outsiders and insiders can know what current credit quality means, both in relative (compared to times past and other banks) and in absolute levels (for provisioning, pricing, and internal profitability reporting). Such tests focus on the fit of the relationship (for example, R2’s), intuition of the coefficients, properties of the errors (for example, no bias between the prediction error and explanatory variables), and parameter stability. In the real world, one does not have nearly enough data to rely solely on unambiguous statistical tests, so calibration and validation involves a process of triangulation, whereby one uses several bits of information to inform a final estimate that is most consistent with these various guideposts.
Factors Pushing Validated Loan Grades
Several forces are driving the move to a more quantitative assessment of middle market loans risk. The first impetus for validation is the advantage gained from securitization. Banks need not be originators and portfolio holders of these obligations; instead, they may allow a wider audience of potential investors the opportunity to diversify their holdings with middle-market bank debt. Middle market loan securitization is happening now and is attractive for several reasons:
Earlier hype notwithstanding, securitization will grow, just as it has for residential mortgage, commercial real estate, and consumer lending. In fact, over the past year, several CLO’s have been complete where the low-level recourse added up to well below the tier 1 "well capitalized" regulatory minimum of 6.0%. This is a clear signal that typical middle-market bank portfolios can arbitrage the more onerous regulatory capital requirements through securitization.
The regulatory authorities are also pushing heavily for greater validation of middle-market lending grades. The Basel communitee’s June 1999 consultative paper, A New Capital Adequacy Framework, outlines several leading initiatives. Most prominent in this proposal is some enhanced granularity and recognition of lower risk for various investment grade, externally (that is, agency) rated obligors. Also mentioned as a promising potential refinement was reliance on internal ratings, where validation of the ratings was the key obstacle. In part, this is driven by the fact that to the extent that securitization arbitrages current capital regulations, an adverse selection problem is occurring where only the most risky assets will remain on bank books. This surely an undesirable end result for regulators entrusted with maintaining the integrity of financial institutions. Regulators are well aware of this trend and see validated credit risk models as a way to reach the "unrated" exposures that make up the majority of bank commercial balance sheets.
A final and unappreciated factor driving quantitative ratings is that expected loss on a loan is a cost. To the extent that it is better estimated, a more precise cost estimate can have a positive effect on pricing models, profitability reporting, planning, strategic decisions (That is, entry/exit), and provisioning. High profile securitization and regulatory rules obscure the importance of these issues. However, these internal profitability reporting issues are perhaps more relevant to the optimal efficiency of lending institutions.
The problem, again, is validation of the risk grades. The words of the Basel Committee are most appropriate:
"…before a portfolio modeling approach could be used in the formal process of setting regulatory capital requirements for credit risk, regulators would have to be confident not only that models are being used to actively manage risk, but also that they are conceptually sound, empirically validated, and produce capital requirements that are comparable across institutions. At this time, significant hurdles, principally concerning data availability and model validation, still need to be cleared before these objectives can be met"
Definition of Validation
Validation is best explained by an example. Moody’s has been rating firms by numerically modified rating (that is, Ba2 as opposed to simply Ba), since 1983. It is important to note that Moody’s ratings are not simply default expectations, but also include expected loss. Nonetheless, the default experience by grade has been sufficiently stable to imply meaningful numbers that can be used for pricing and planning. Figure 1 below presents a validated rating system.
Figure 1
At the other extreme are institutions with no data on losses or transitions by internal grade, only current composition of loans by internal ratings and the intended mappings of these groupings into agency ratings.
Validation implies a confirmation; therefore, validation should pertain to an existing model’s accuracy. The first step is creating some sort of model and calibrating it to the data. In practice, many banks use internal bank grades to measure default risk as opposed to expected loss; these grades have mappings to both agency ratings (example, S&P and Moody’s) at the upper end and to regulatory classifications at the lower end. A validation exercise would then check if the mapping from internal grade to agency rating or default rate were appropriate. It is important to recognize that we are never gauging whether the grading system is "true" or "false," since any nontrivial model is probably false. For example, Newton’s inverse square law of gravity has been demonstrated false by Einstein’s general relativity, yet, clearly, Newton’s laws shouldn’t be "rejected." On the other hand, certain internal models can be true but so undocumented or convoluted to the extent that it’s not possible to verify them. Instead, the objective should be to determine empirically documented, reasonable, and prudent loss expectations for the internal grading system.
If the default rate of the loan grade is stable yet inconsistent with the expected or mapped default rate, the model should not be invalidated, but instead recalibrated. The ordinal stability of the default experience implies the model is measuring something consistently, and all that is needed to rectify the situation is to recalibrate the mapping by moving it up or down a notch. A model is calibrated, though potentially suboptimally, if over the long run the actual "bad" rate equals the probability assigned. Thus, validation and calibration are usually a joint exercise. It would be wasteful just to validate and not use the information gained to simply recalibrate the model. A continuous process of validation and calibration, given a stable internal grading definition, should ensure validation over time.
A good internal model must not only accurately rank risk groupings, it must do so in an understandable and convincing way. This does not imply that a successful model is understandable by everyone, or even most people. In fact, the detailed inner workings of the best risk models may only be fully understood only by a minority of people, as with consumer bureau scores or swaption models. Yet there is a vast difference between a minority and virtually none.
Reservations towards encroaching quantification that accompanies validation. It’s important to acknowledged that a poorly implemented quantitative system maybe worse than the status quo. For example, it would be less than optimal to replace a subjective rating system with a mechanical system that did not allow for exceptions. This is a serious concern that demands skilled implementation and highlights the continued importance of knowledgeable, experienced credit professionals. Yet there is also the more general reservation that quantitative evidence and statistical inferences project too rigid a pattern, and many are more comfortable with close observation and narrative description. This objection is much less reasonable. The purely qualitative approach is best when there are few theories and data from which to develop reasonable expectations, but at higher levels of aggregation this method becomes intractable. Paul’s "good" may be the same as Peter’s "very good," even though they map into significantly different default rates. The limitations of a subjective system are similar to the limitations of a survey in that, while both are informative, they do not have meaningfulness sufficient for the purposes mentioned above. The bottom line is that we want to say "bank X’s’ portfolio is riskier than average", or that the "level of subordination needed for a certain CLO structure for Bank X’s portfolio is 3%," or that "the required regulatory capital for bank X is 5%." These are precise, empirically testable statements that demand quantitative tools.
Internal Grade Objectives
Scoring systems must consider the intended use of the system. Is the system primarily for pricing and origination or provisioning and monitoring failing credits? If for pricing and originating, we want to know the expected loss of the loan over its life or through the business cycle. Then we can amortize this loss amount and, given a large sample of loans statistically assess the expected profit margin on each loan, just as actuarial tables help insurance companies price policies for individuals even though for any specific person the company either makes or loses money. This is the lifetime expected loss, or "through the cycle" objective. It is essential in this view to examine the loss rates over a long period of time for a "static pool" or "cohort." To see why, consider the following example. Loans are issued in grade 4, but if the obligor starts to lose money, they are immediately placed in grade 7. Assume that with initial levels of book equity it takes one year to default on debt. In this environment grade 4 loans will have zero loss rates over 1 year, and grade 7 loans will have significant loss rates. Clearly, this type of system, in which troubled companies are placed in limbo grades not used for origination (for example, "watch"), would underestimate the loss rates of originating grades and produce perverse information for profitability reporting purposes. It is unhelpful to document that the expected loss rate on originating grades is 0%, yet 2% for the nonoriginating grades.
For validation purposes, examining the credit strength of an issuer over a three-five year horizon, the lifetime loss approach is most relevant. This affects risk management at origination as opposed to after origination. For current regulatory, restructuring, and earnings management purposes, a one-year horizon is better suited. This is particularly relevant distinction because one-year horizons emphasize the regulatory grades, while lifetime horizons emphasize the accuracy of the originating grades. As suggested above, this is because newly originated loans usually take nine to 18 months to degrade to a default or write-down state. Thus to predict future charge-offs next year, one need not worry about the relative proportion of 3 and 4 credits, but instead the amount of OAEM and Substandard loans. To assess an appropriate loss rate to a loan upon origination, knowing the lifetime loss rate is essential.
Step 1: Examining the Current Rating System
The first step is to use the existing internal bank grade system. In the U.S., the first thing to know about internal grades is how they relate to the regulatory grades, which are virtually always used as the bottom four grading classifications in an internal system. Loans of quality higher than this are considered "pass grade." The pass grade loans in practice have anywhere from one to 18 groupings, and are usually split into "investment grade," "noninvestment grade," and, perhaps, a "watch" category. The watch category is a grade where loans are not originated; rather, they are placed temporarily as potential problems are resolved. The typical bank internal grading system resembles Figure 2.
Grade |
Moody's Mapping/S&P |
Regulatory |
|
Inv Grade |
1 |
Aa3/AA- |
pass |
2 |
A3/A- |
pass |
|
3 |
Baa3/Baa- |
pass |
|
NonInv. Grade |
4 |
Ba1/BB+ |
pass |
5 |
Ba3/BB- |
pass |
|
Watch |
6 |
B3/B- |
pass |
Criticized |
7 |
NA |
OAEM* |
8 |
NA |
Substandard |
|
9 |
NA |
Doubtful |
|
10 |
NA |
Loss |
* "Other Assets Especially Mentioned, also known as "Special Mention."
Many internal grading systems embody the following qualitative model: a loan officer sees a potential borrower and tries to put it into perspective by comparing its balance sheet and income statement with previous borrowers. A borrower that is above average go into the higher quality pass grade; one that is lower than average goes into the lower quality pass grades. Customers with negative net income are usually put into watch categories until it appears highly probable that interest payments will not be met without extraordinary actions (for example, selling assets or infusing equity). The period of time it takes for a customer to deteriorate from its origination grade or healthy status to substandard is usually somewhere between nine and 18 months. It may be unwritten or even difficult to articulate "average" or why exceptions are made, but this does not mean some sort of model underlies what is happening. It is the job of a calibration/validation exercise to quantify the effects of this type of strategy and suggest specific improvements to it.
Step 2: Estimate Recovery Rates
It is useful to break the validation into two steps. First, note that charge-offs are the product of two adverse eventualities: default and loss given default (also known as LGD, or Loss in event of default, LIED). Loss given default is sometimes examined from the other end: recovery rates, where LGD=1 – recovery rate. Most banks have severe system limitations. Data on losses is rarely kept as loans that migrate to the classified grades are subsequently moved to a workout area. Often one then loses track of the original grades for these assets.
In recognition of this problem, Moody’s recommend estimating recoveries in the following way. First, perhaps aggregate data is kept for the entire commercial area or just a line of business. For this information to be especially informative, it should exist over the business cycle. If data is not kept by line of business, it might be kept by collateral type or seniority. That is, the bank may have data indicating that recovery rates on loans secured by accounts receivable are 80%, while for unsecured lendings they are 60% (on average). Even in the best of scenario, this should be supplemented with broader industry wide data to inform your recovery rate assumptions.
The above situation, In which a bank has detailed information through the cycle for a line of business exhibiting consistent underwriting standards is rare. Looking next at the charge-off rates for this same portfolio, one can back out the implied recovery rate. For example, assume that 5% of the portfolio is classified as nonaccrual and that over this same period, charge-of rates were 1% of assets, implying average recovery rates of 80%.
A final approach is to use data from previous studies. It is useful to distinguish between syndicated bank loan recoveries from real commercial loans, since syndicated bank loans tend to be dominated by large, high quality obligors, and this may not be an appropriate grouping for the portfolio in hand.
Authors Debt Type Average % Count
Altman and Kishore (’96) sr. sec. bonds 58 85
Asarnow and Edwards (’95) C&I bank loans 65 831
Eales and Bosworth (’98) small business loans 69 2492
Hurt and Felsovalyi (’98) Latin Am. bank loans 68 1149
Grossman, et. al. (’97) Syndicated Bank Loan 82 60
Carty and Lieberman (’96) Syndicated Bank Loans 71 58
Carty and Lieberman (’96) Bank Loans 79 229
Carty et. al. (’98) Sr. Sec. Bank Loans 87 178
Hamilton and Carty (’99) private bankruptcies 78 200
S&P (’98) Sr. Sec. Bank Loans 84 258
Society of Actuaries (’96) Private Placements 64 393
Figure 3 suggests it is reasonable to assume a recovery rate for bank loans above the recovery rate on bonds. Note the first estimate refers to bonds, and is the lowest estimate at 58%. Estimated mean recovery rates on loans vary between 65% and 85%. This industry provides empirical support for any final assumption. Further refinement within these broad bands requires more customized information. Allowing for time variation in theses sample averages, prudence would suggest assuming 10-15% less than the lowest average recovery rate on a bank loan portfolio given no other supoorting documentation. This highlights how additional information, in addition to adding precision, can favorably impress outside regulators, ratings agencies, and investors.
To the extent a portfolio has information on the proportion of various collateral type and advance rates used, a similar adjustment can be applied to refine these broad averages into low and high recovery rate assumptions. Again, similar documentation of the public record can be used to inform these estimates.
Step 3: Estimate Default Rates by Grade
The first test of internal rating systems is to look directly at the agency-rated obligors and actually see where they are in the grading system-- a "concordance mapping." If grade 4 maps into a Ba1, but Ba1s tend to be evenly distributed in grades 3 and 4, appropriate adjustments are made to recognize that practice and policy are slightly askew. The biggest problem with this approach is that rated credits are somewhat unique, as their external ratings that are probably known to the underwriter and may influence the process. To take a trivial example, if, in practice, loans with agency ratings are assigned to their mapped internal grade with few exceptions, then internal grades will be consistent with their policy mapping for rated loans irrespective of the relative risk of the unrated obligors. It is important to point out that for most middle market portfolios, the proportion of rated obligors is so small that, adjusted for sampling error and the concern above, the usefulness of a ratings concordance exercise is extremely limited. Nonetheless, it is informative, the more so the greater the proportion of credits with agency ratings.
It’s possible to examine whether historical loss rates exist by internal bank grade. In the best-case scenario, defaults by initial grade are kept over the life of loans, which generates a static pool of pass grades is followed for three or four years. If this information exists over a business cycle, many tests are available not only estimate accuracy but also refine the process with complimentary quantitative measures. The number of defaults needed to generate sufficiently powerful tests is usually more than most banks can internally generate; thus, there is no reason to elaborate on such tests in this article, even though they can be extremely useful.
As mentioned above, origination grade credits rarely go into default within one year, and so one-year default rates by grade would significantly understate the true annual loss rates of the pass grades. If there are not sufficient "cohort" cumulative defaults over longer periods of time, a transition matrix can be used to generate such defaults. An important caveat is that since the recessionary cycle in many countries has long passed, this data can not be used to directly estimate a "full sample" transition matrix or four year default rate, since without recessions averages are seriously biased. Nonetheless, given Moody’s experience with transition matrices and cumulative default rates in good and bad times, it’s possible to use information on the relative performance of credits in good and bad times to appropriately adjust internal data to reflect "through the cycle" expectations.
It is also useful to look to what the historical charge-off rate and proportion of nonperforming loans is in the line of business. Most banks should record this information for the last several years at least and, in many cases through the last recession. It is important to zero in on the middle-market portfolio and thus exclude consumer loans, nonowner occupied real estate, and other specialty lending areas that would cloud inference on the middle market through-the- cycle’ loss rate. This information can be compared to industry-wide loss rates compiled by data vendors like SNL and Sheshunoff. We have further data from Moody’s database of default rates on traded debt, as well as Dunn&Bradstreet failure rate data. All this information can be used to inform the final top-down loss rate estimate.
Finally, Moody’s has developed a significantly large proprietary database of private companies. This dataset has been used to developed a model, RiskScoreTM, a model that predicts default. RiskScore has been estimated and tested—calibrated and validated—on a large dataset of private defaults. Internally, Moody's sees RiskScore as an essential tool for validation, which underscores an interesting aspect of validation. Just as credible experts can bestow credibility upon other experts, so can validated quantitative tools bestow credibility upon other quantitative tools. Moody’s hopes that in the future RiskScore will be as useful as bureau scores in consumer lending: not sufficient for underwriting, but essential given its accuracy and ease of use.
One useful exercise is to use RiskScore in a crosstabbing exercise. If we take a bank’s internal grades and rank them from low to high RiskScore buckets, we examine if this dispersion is "too large." If it appears that too many adverse credits appear in internal rating grades, we can compliment this test by looking at more transparent measures, such as leverage ratios and profitability. RiskScore, after all, is simply an aggregation of financial ratios, and so it is highly probable that any systematic problem uncovered by RiskScore can also be seen through these more tangible measures. Comparisons to the distributions of these measures from our proprietary database and the publicly traded universe can highlight potential omitted variable biases. For example, in cross-tabbing internal grades by leverage reveals that the proportion of highly leveraged loans is twice the national average for portfolios with similar purported expected loss. This would suggest that the grading system is not sufficiently granular and, more important, provides explicit guidance on how to adjust a credit system to make it more accurate.
Step 4: Tying It All Together
This information is ultimately used to estimate loss rates by internal pass grade. That algorithm is explained in Moody’s extended document but briefly, it makes sure that loss rates by originating grade are consistent with the following known facts:
The intuition for the relevance of the above is the following. If all loans are originated in equal proportions in grades 4 and 5, the annual loss rate is 50 basis points, and recovery rates are assumed to be 50%, then the annual default rates for these two grades must average 100 basis points. It’s not possible to adjust one without adjusting the other. This is the top-down check on the bottom-up estimates.
Most assumptions come with ranges, and qualitative guidelines suggest whether to use the upper or lower bound of these ranges (like with recovery rates). A policy manual should be clear, comprehensive and precise. As an example of the range, it should outline of loan advance as a percent of collateral, which should vary by collateral type. It should specify where collateral values and residual estimates on leases are procured. Financial covenants should be defined for the various relevant financial ratios. In general, greater restrictions should be placed on less familiar lending initiatives.
Conclusion
Basel has put the point succinctly: "At present, there is no commonly accepted framework for periodically verifying the accuracy of credit risk models." This comment was made in recognition of the fact that banks do not have historical data on sufficiently granular pass grade performance over a business cycle. The first priority in validation is clearly creating an archival, transaction level database that combines obligor and facility information with payment history. As they say about bonsai trees, since they take 50 years to grow you should plant them tomorrow. In the meantime, prior to the ability to independently validate an internal risk rating system using only the bank’s own, idiosyncratic data, this outline above highlights how to generate meaningful expected loss estimates by internal pass grade.
Foonote 1 It appears Basel was mainly discussing commercial, as opposed to consumer loans, though this was not explicitly stated. Nonetheless, middle market loans are the most unquantified lending group within a Large Complex Banking Organization (LCBO), and therefore also within the commercial loan portfolio.
References:
Eales, Robert and Edmund Bosworth, "Severity of Loss in the Event of Default in Small Business and Larger Consumer Loans," Journal of Lending and Credit Risk Management, May 1998.
Altman, Edward and Vellore Kishore, "Almost Everything You Wanted to Know about Recoveries on Defaulted Bonds," Financial Analysts Journal, Nov/Dec 1996.
Asarnow, Elliot and David Edwards, "Measuring Loss on Defaulted Bank Loans: A 24 Year Study.," Journal of Commercial Lending and Credit Risk Management, March 1995.
Hurt, Lew and Akos Felsovalyi, Measuring Loss on Latin American Defaulted Bank Loans: A 27-Year Study of 27 Countries, Citibank/Portfolio Strategies. August 1998.
Grossman, Robert J., William Brennan and Jennifer Vento, Syndicated Bank Loan Recovery Study, Fitch Research, October 22, 1997.
Carty, Lea V. and Dana Lieberman. Defaulted Bank Loan Recoveries, Moody's Investors Service, Gloabal Credit Research, Special Report. November 1996.
Hamilton, David and Lea Carty. Debt Recoveries for Corporate Bankruptcies, Moody's Investors Service, Gloabal Credit Research, Special Report. January 1999.
Carty, Lea, et al. Bankrupt Bank Loan Recoveries, Moody's Investors Service, Gloabal Credit Research, Special Report. June 1998.
Society of Actuaries. 1986-1992 Credit Risk Loss Experience Study: Private Placement Bonds. Society of Actuaries, Schaumburg, IL. September 1996
Van de Castle, Karen. Recovering Your Money, Insights into Losses from Defaults. Standard and Poor's CreditWeek. June 16, 1999.