What is Write a high-performing fraud rule with me: A real-world example?

Learn how to write resilient, high-performing fraud rules using an iterative, data-driven approach. This guide walks through anomaly detection, false positive reduction, precision vs. recall tradeoffs, and how AI and LLMs accelerate modern fraud rule research.

How to Write High-Performance Fraud Rules Using AI-Assisted Analysis

Writing a fraud rule is a task most data analysts and fraud specialists can do in their first week on the job. Writing a good fraud rule, alas, is a tad trickier.

To understand why, we first need to establish what makes a fraud rule a good one. The answer is quite simple: its performance needs to be effective in the moment while also being resilient to fraudsters’ behavioral changes.

For example, rules with static amount conditions can be quite brittle: if my rule contains a condition where the amount must be >$100, fraudsters can still move $99, bypass my rule, and make the same amount of money.

In this blog post, I want to share the analytical process I followed when recently writing a fraud rule for a selection of our customers. Hopefully it can give you some ideas on how to structure yours.

Before we start, a few words about methodology

Today’s focus is the research stage of a new rule. And as with every analytical research task, my recommendation is to conduct it in an SQL environment rather than in Excel or on your rule engine itself. Of course, feel free to conduct research however you want, but in my experience the exercise will be more effective inside your data warehouse. Are you only a beginner in SQL? Fret not, that’s what we have LLMs for nowadays!

Another thing I’d like to note is the set of KPIs that guide me when I’m conducting an iterative research process.

Before approving a rule to go live, my organization has a policy that dictates the success criteria it needs to uphold. In many organizations, this is a set of goals, validations, and documentation required to release a new rule.

The final success criteria should inform the design of our iterative research process, down to which KPIs to look at and what targets to shoot for. When I’m building a rule one condition at a time, this helps me understand if I’m making the right decisions.

While this can change from use case to use case, there are two KPIs I like to keep track of throughout every step of the process:

Precision: How accurate my rule is and, by extension, how many false positives are impacted
Recall: How many of the fraud cases I catch

The balancing act is pretty straightforward but not simple. For each condition I add to the rule, I want to see my precision increasing while my recall stays pretty much the same.

Lastly, a few words on the dataset I was working with. When I started my review, I focused on a specific segment: payment events of any kind, in a specific time period, from a specific set of customers.

My initial dataset included nearly 476K records, with 1,480 of them being tagged as fraudulent. This set the initial precision (or fraud rate) to 0.31%.

I always use 40% precision as my benchmark for payment auto-decline rules, so if I can reach that number with a high enough recall, I’ll be happy.

How to write fraud rules in 3 simple steps:

I’ve spent 16 years in fraud prevention, written hundreds of rules, and overseen systems that ran thousands of fraud prevention rules.

Here’s the process I use to write high performing rules:

Find an “inciting anomaly”. Define the core suspicious behavior we’re trying to stop.
Explain the anomaly. Refine the rule to clean up false positives by excluding: Anomaly explanations: Legitimate behaviors that mimic our main suspicionTrust signals: Indicators that characterize a generally low-risk population
Validate and approve.

Let’s break it down.

Finding the “Inciting Anomaly”

Here I was, browsing through the dataset I described above, looking at chargebacks and searching for inspiration. Then I noticed it: several fraud cases where the IP’s geo-location (specifically, the IP state) weren’t matching the address of the user.

It made me wonder. Geo-mismatch is such a hallmark fraud signal, yet I found quite a few of these cases. You would expect such mismatches to be easily identified by the system, right?

The reason, as you might have guessed already, is that the “approved mismatch” population was overflowing with false positives. Out of 4,355 cases, only 370 were tagged as fraud.

I could see how these “low-risk” mismatches were not declined, but it also presented a good “hook” for researching a rule, or an “Inciting Anomaly”. An abnormal signal that describes many fraud cases, even if with low accuracy.

And indeed, a precision of 8.5% means that having an IP <> Address mismatch is already 27x riskier than the general population, and it catches exactly 25% of the fraud.

I smelled promise. Usually, any signal that is 20x riskier than the general population is a good starting point for rule research.

Rule draft

IP_state != address_state

Rule performance

Fraud count	Event count	Precision	Recall
370	4,355	8.5%	25%

Explaining the anomaly

Once I find an inciting anomaly, my next goal is to increase the precision without lowering the recall too much.

What I don’t want to do is add more suspicious indicators to my rule logic. Why? Because that has a high chance of splitting the bad population into different fraud rings.

That would not only reduce recall, but would also make the rule very specific. And remember, specific rules are easy to bypass.

Instead, I want to describe what trusted customer behavior looks like so I can exclude it from the rule’s logic. It sounds like semantics, but you’ll soon realize that the key part in rule writing is formulating false positives, not finding suspicious indicators.

There are two ways in which we can identify false positive patterns we can exclude: anomaly explanations and trust signals.

Anomaly explanations

An anomaly explanation is any signal that isn’t necessarily low-risk on its own, but one that can “resolve” the anomaly. Basically, it showcases why a good user would exhibit that behaviour.

For example, whenever I deal with anomalies involving IPs, my go-to explanation is the IP type itself. If it’s a highly controlled IP (.gov, .mil, .edu), it can explain why there’s a geographic mismatch between it and the user’s address. This is partially because it can imply frequent travel, but more importantly, it often points to the use of internal VPNs.

Of course, using a highly-controlled IP is in general safe even outside of our particular anomaly. But it comes with a price: they aren’t that common. And that indeed was the case here as well.

Rule draft

IP_state != address_state

AND IP_connection_type IN (“mil | gov | edu | org | corp”)

Rule performance

Fraud count	Event count	Precision	Recall
370 (-)	4,344 (-11)	8.5% (-)	25% (-)

We only managed to save a handful of false positives but the performance is practically the same.

Next, I wanted to look closer at IP proxy signals. It was apparent that many of the fraud incidents originated when the likelihood for proxy IP use was medium or high. This makes sense, as fraudsters often sloppily use proxy IPs that are far from their victim’s address as long as it’s in the same country.

So I focused on the events where the likelihood for proxy was low and noticed that ISP connection types were much safer than mobile/hybrid ones. That also made sense: fraudsters are less likely to use stable IP ranges that might expose them.

With that, I added a condition where I excluded cases where the proxy likelihood is “low” and the connection type as ISP.

Rule draft

IP_state != address_state

AND IP_connection_type IN (“mil | gov | edu | org | corp”)

AND (IP_proxy == “low” AND IP_connection_type == “ISP”)

Rule performance

Fraud count	Event count	Precision	Recall
370 (-46)	2,860 (-1484)	11.3% (+2.8%)	21.4% (-3.6%)

Looks pretty good right? Maybe, but I wasn’t happy with the amount of fraud cases we “lost” here. Even though we reduced many false positives, I was confident more analysis would help me keep the bad guys in.

Here's a short clip I captured that demonstrates how I generally go about it:

You can see me reviewing the 1,484 cases that the last exclusion removed from my rule population. The first thing I do is sort the results with the fraud cases on top, so I can easily spot patterns.

Then I quickly check how many fraud cases I see overall (the excluded 46), and finally I try to spot how to differentiate the bad population from the good one.

In this case (and I obviously spotted it before I recorded the clip), I noticed that all bad cases didn’t have a verified email address. I quickly edited it into my rule logic.

Rule draft

IP_state != address_state

AND IP_connection_type IN (“mil | gov | edu | org | corp”)

AND (IP_proxy == “low” AND IP_connection_type == “ISP” AND cust_email_verified == TRUE)

Rule performance (ignoring the last version)

Fraud count	Event count	Precision	Recall
370 (-)	3,279 (-1065)	11.3% (+2.8%)	25% (-)

Notice that while my precision is the same, and while I didn’t exclude as many false positives, the recall is much improved. That’s a far better rule version.

Trust signals

At this point, I spent several more hours trying to figure out if I can find other good explanations for why the IP and the address should mismatch. Failing to find more, I switched to my second tactic: identifying general trust signals.

Here, the idea is not to explain why this anomaly makes sense, but to narrow down the population by excluding general low-risk segments from it.

The email verification signal we used above is a perfect example. Within the context of this dataset, it wasn’t particularly strong on its own, until we coupled it with the IP type condition.

On the other hand, having a verified phone proved to be a robust enough signal on its own, so I included that as my next layer.

Rule draft

IP_state != address_state

AND IP_connection_type IN (“mil | gov | edu | org | corp”)

AND (IP_proxy == “low” AND IP_connection_type == “ISP” AND cust_email_verified == TRUE)

AND cust_phone_verified == TRUE

Rule performance

Fraud count	Event count	Precision	Recall
366 (-4)	1,862 (-1417)	19.7% (+8.4%)	24.7% (-0.3%)

Finally, I noticed that many good customers caught by my rule version had business emails. Therefore, I opted to only keep “free” email domains in my dataset.

This isn’t necessarily a strong indicator on its own (fraudsters can steal your email with no problem), but it might explain more travel or VPN stories that we couldn’t identify directly.

Rule draft

IP_state != address_state

AND IP_connection_type IN (“mil | gov | edu | org | corp”)

AND (IP_proxy == “low” AND IP_connection_type == “ISP” AND cust_email_verified == TRUE)

AND cust_phone_verified == TRUE

AND email_domain_type == “free”

Rule performance

Fraud count	Event count	Precision	Recall
296 (-70)	769 (-1093)	38.5% (+18.8%)	20% (-4.7%)

As you can see, even though our recall dropped by nearly a fifth, we also doubled our precision to a point where with some tweaking the rule can be released to production.

Preparing the rule for launch

Once we lock down the rule’s logic, there are two steps left to take before we can launch it live in production. The first is to approve the overall rule’s performance and its expected impact on the business.

To provide a complete performance overview when I submit it for approval, I’ll share the following:

Description	This rule targets payments for customer segment X, where the IP state mismatches the user's address state.
Expected hits/month	±1,250 payments / month
Precision: count / amount	38.5% / 39.6%
Recall: count / amount	20% / 20.2%
Monthly / Yearly losses saved	$80.5k / $966k
FPR: count / amount	0.1% / 0.1%
Monthly / Yearly revenue lost	$203k / $2.44mm

This should give the approving manager all they need to confidently sign off on my request.

But the approval is just half the battle. The second thing we need to do is complete the full process of rule validation. The good news? We just completed the first step (out of six)!