Fraud Management and the Pricing of Tail Risk
Why tail risk is often underpriced and common fraud metrics can be misleading
As a new fraud manager, I started reflecting on the fundamental principles of fraud management. I began by looking into the metrics commonly used by fraud teams, such as precision, FPR, and recall. However, I soon realized how difficult it is to set KPIs in fraud because of a core paradox: absence of evidence is not evidence of absence.
In this blog post, I want to go one step further and discuss the causes of this paradox, and why I believe it is extremely dangerous to optimize fraud management primarily around precision, FPR, and recall.
Fraud events often have the following characteristics:
The frequency of fraud is small relative to the total number of events. At least, if the business is still surviving.
The loss from fraud is often many orders of magnitude larger than other types of loss, such as credit loss per event. A single fraud incident can wipe out a month’s profit.
The real damage usually comes from previously unknown attack vectors.
Fraud events happen more often than we think. When we say “low frequency,” we may imagine once every few years. In reality, they may happen much more often.
The consequences are as follows.
First Order Consequences
A 99.99% recall may still not be enough, because the remaining 0.01% of uncaptured fraud can still put the business at risk.
The absence of observed fraud events can create a false sense of security, even though only one disconfirming event is needed to reject the proposition that we are safe.
Thinking only in terms of frequency can put us in grave danger. First, we may assume fraud events are like 4-sigma events—rare and exceptional—when in fact they happen far more often, perhaps even every day. Second, even a truly rare event can still hurt the business badly because of the scale of exposure.
Fraud events cannot be treated in isolation, because the business may never fully recover from a large incident. There is path dependency.
Second Order Consequences
Focusing too much on precision, FPR, and recall is dangerous. These metrics can miss the small fraction of unknown risks that may cause losses 1000 times larger than what we are prepared for.
Similarly, machine learning models are not sufficient for dealing with extreme unknowns. Over-focusing on deploying more advanced models can shift attention away from what truly matters.
Average risk can be highly misleading. It changes dramatically once an outlier occurs, and those outliers are often both more frequent and much larger than people expect. Monitoring average risk alone can create a false sense of security.
Frequency-based caps, such as the number of transactions allowed, are conceptually incomplete because a single event can still wipe you out.
Strategy
Are we doomed then? Not necessarily.
Although it is nearly impossible to predict or control exactly when fraud will happen, exposure can still be controlled. I would argue that the primary responsibility of a fraud team is to cap the downside of the business. In the process, the team should also aim to reduce the premium paid for that protection—for example, by lowering false positives. But uncapped exposure is non-negotiable.
A cap may look too simple, but we are not here to impress our peers.
Finally, fraud risk practitioners deserve more respect. In many ways, the fraud team is effectively the seller of a put option, while the business is the buyer. The fraud team’s upside is capped. The best outcome it can possibly deliver is close to 100% recall, which is impossible to achieve in practice. Yet its downside is theoretically uncapped. When an extreme event happens, the fraud team is often the one blamed.
At the same time, the premium paid for this protection—not in salary, but in lost business opportunities—is often mispriced relative to the guarantee the fraud team is expected to provide. The business tends to focus on frequency: fraud is unlikely, so why sacrifice business volume to manage a “long-tail” event? Fraud teams, by contrast, have to focus on magnitude.
