AUC, KS, Precision, and Recall: A Risk Analyst’s Guide
As analysts in the risk management industry, we live and breathe acronyms like AUC (Area Under the Curve) and KS (Kolmogorov-Smirnov). These are the gold standards for evaluating classification models in credit risk. However, if you—like me—have recently rotated into an Anti-Fraud team from credit risk team, you’ve likely encountered two different metric kings: Precision and Recall.
How do these metrics relate to each other?
The “Two Rooms” Analogy
The standard confusion matrix terms—True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN)—can be headache-inducing to communicate in practice.
To simplify this, imagine two separate rooms:
The Bad Room: Contains only actual bad actors (Total Bad = TP + FN).
The Good Room: Contains only actual good users (Total Good = TN + FP).
Your model is a gatekeeper. You walk into each room with your classifier.
In the Bad Room, you want the model to identify everyone as bad. The percentage of people you successfully catch here is your True Positive Rate (TPR).
In the Good Room, you want the model to identify no one as bad. The percentage of people you mistakenly flag as bad here is your False Positive Rate (FPR).
A perfect model has a TPR of 100% (catches everyone in the Bad Room) and an FPR of 0% (flags no one in the Good Room).
Visualizing AUC and KS
Realistically, models output a probability (e.g., “There is a 90% chance this user is a fraud”). We use a “threshold dial” to decide who to flag.
Imagine turning this dial:
Threshold 100% (Strict): You only flag apparent fraud. You catch almost no one in the Bad Room (TPR≈0), but you also annoy no one in the Good Room (FPR≈0). This is the point (0,0)on the ROC plot.
Threshold 0% (Loose): You flag everyone. You catch every fraudster (TPR=100%), but you also falsely flag every good user (FPR=100%). This is the point (1,1).
AUC (Area Under the Curve) measures the model’s performance across all possible settings of this dial. It plots TPR against FPR. A random guess gives you a straight diagonal line (AUC 0.5), meaning for every extra bad person you catch, you annoy a proportional number of good people. A good model bows upward, maximizing the gap between the TPR and FPR.
KS (Kolmogorov-Smirnov) focuses on the single best point on that curve. It is simply the maximum difference between TPR and FPR. While AUC looks at the whole story, KS asks: “At the single best setting of the dial, how much separation can we get between the good and bad populations?”
Deep Dive: Why KS is Cumulative
KS is usually plotted using the cumulative number of bads and goods. The maximum difference between the two curves is the KS.
Imagine taking all the people from both rooms and lining them up in a single queue, sorted by their model score from Highest (Most Suspicious) to Lowest (Least Suspicious).
Now, imagine walking down this line from the start. This is equivalent to lowering your threshold.
Cumulative Bad (TPR): Every time you walk past a Bad Person, you add them to your count. If there are 100 Bad People total, and you have passed 50 of them, your “Cumulative Bad %” is 50%. This is exactly the True Positive Rate (TPR).
Cumulative Good (FPR): Every time you walk past a Good Person, you add them to your count. If there are 1,000 Good People total, and you have passed 100 of them, your “Cumulative Good %” is 10%. This is exactly the False Positive Rate (FPR)
Below is a simulation dashboard I asked Gemini to build to visualize these concepts: Link
Why Anti-Fraud Cares: Precision and Recall
In the anti-fraud world, Recall is just another word for TPR (Bad Room coverage). But Precision is different.
While TPR and FPR require you to look at the rooms separately, Precision requires you to look at who the model flagged from both rooms combined. It asks: “Of all the people the model claimed were bad, how many were actually bad?”
Why is this preferred over FPR in fraud operations?
Operational Reality: In fraud, a positive flag usually triggers a manual review, an SMS alert, or a transaction block. These actions have direct costs (agent time) and customer friction (insult). Precision measures the “purity” of the alert queue.
The Class Imbalance Problem: This is the key differentiator. Precision is highly sensitive to the Bad Rate.
FPR is calculated only inside the Good Room. If you double the number of good customers, the FPR remains stable.
Precision depends on the ratio of good to bad. If the number of good customers explodes while the number of fraudsters stays the same, your Precision will plummet because the “noise” (False Positives) drowns out the signal.
In summary, the Modeling Team often focuses on AUC/KS because they measure the model’s pure ability to rank order, independent of the portfolio’s bad rate. Anti-Fraud focuses on Precision because it reflects the actual operational pain of sifting through false alarms in a sea of good transactions.
Why does the credit risk team seldom look into Precision and Recall?
This question deserves a dedicated deep dive in the next post.
Fundamentally, both Underwriting and Anti-Fraud teams share the exact same goal: maximizing profit. They simply approach the P&L equation from opposite ends (Alert: Fraud is a dynamic process. The environment is not stable because the fraudster evolves.):
Underwriting aims to maximize disbursements. They expand approvals until the marginal cost of defaults outweighs the marginal revenue from interest. Their constraint is the Breakeven Cost of Risk.
Anti-Fraud aims to minimize losses. They expand fraud detection until the cost of friction (insulting good customers) outweighs the savings from stopping fraud. Their constraint is Breakeven Precision.
In my following note, I plan to demonstrate mathematically that these two concepts are identical: the Breakeven Cost of Risk in lending equals the Breakeven Precision in fraud prevention.
