Fraud Detection by Stacking Cost-Sensitive Decision Trees

Recently, we published a research paper showing how it is possible to detect fraudulent credit card transactions with a high level of accuracy and a low number of false positives. By using ensembles of cost-sensitive decision trees, we can save up to 73 percent of losses stemming from fraud. Here’s how.

Classification, in the context of machine learning, deals with the problem of accurately sorting examples in a dataset into sub-groups or classes. Traditionally, classification methods aim to minimize the misclassification of examples, where the predicted class of is different from the true class. Such a traditional framework assumes that all misclassification errors carry the same cost. However, this is not the case in many real-world applications: Methods that take into account variations in misclassification costs are known as cost-sensitive classifiers.

A credit card fraud-detection model is normally based on a machine-learning algorithm that attempts to predict the class of a set of transactions based on its specific variables.

Traditionally, such systems are evaluated using a standard binary classification measure, such as misclassification error, total fraud detected, or F1Score statistics. However, these measures may not be the most appropriate evaluation criteria when evaluating fraud detection models because they assume that all hits and misses carry the same value. This assumption does not hold in fraud prevention—incorrectly predicting a fraudulent transaction as legitimate carries a significantly different financial cost than a false positive.

In order to create a model that takes this into account, each type of misclassification error is assigned a different cost. The following table presents the cost matrix, with the costs associated with two types of correct classification (true positives and true negatives), and the costs associated with two types of misclassification errors (false positives and false negatives).

In the case of false positives and true positives, the costs are equal to the administrative costs of analyzing the transaction and contacting the cardholder. When a false negative occurs, meaning that fraud is not detected, the losses are equal to the specific amount stolen in the transaction.

Decision Trees

Cost-Sensitive Decision Trees

Introducing the cost into the training of a decision tree has been widely studied as a way of making classifiers cost-sensitive. However, in most cases, approaches that have been proposed only deal with the problem when the cost difference between false positives and false negatives is constant.

In a previous research paper, we proposed an example-dependent, cost-sensitive decision tree (CSDT) algorithm that takes example-dependent costs into account when training and pruning a tree.

The CSDT method uses a new splitting criteria during the construction of a decision tree. Instead of using a traditional splitting criterion such as Gini, entropy, or misclassification, the exampledependent cost, as defined in the cost equation, is calculated for each tree node. Then, the gain of using each different split is evaluated as the decrease in total cost.

Stacking Cost-Sensitive Decision Trees

The main idea behind the ensemble methodology is to combine several individual classifiers, referred to as base classifiers, in order to create a classifier that outperforms the individuals. Typically, the process involves creating random subsets of the database and then training a different classifier on each of them. The classifiers are then combined in order to make a decision.

The most common way to combine classifiers is to use majority rule: If more than 50 percent of the classifiers predict a given example to be positive then the prediction is positive.

Through our research, we have expanded that methodology to include a novel way to combine the individual classifiers: We created a second model that uses the previous predictions as inputs and evaluates a cost-sensitive logistic regression to generate the final prediction.

This new model takes into account the financial costs when making a prediction.


As an example, let’s look at a dataset provided by a large European card processing company.

The dataset consists of fraudulent and legitimate transactions made with credit and debit cards between January 2012 and June 2013, totaling 750,000 individual transactions. The database also includes a fraud label indicating a transaction that was identified as fraud. This label was created internally in the card processing company and is considered to be highly accurate. In the dataset, 0.467 percent of the transactions are fraudulent. Moreover, the total financial losses due to fraud equal €866,410. The algorithms are compared using the F1Score and Costs.

Decision Trees

The results show that the stacking of cost-sensitive decision trees is the most efficient at minimizing financial cost. It is interesting to see just how different the results are between a standard decision tree and the cost-sensitive decision tree.

In conclusion, by using a cost-sensitive model and combining a number of decision trees, we can better detect, assess, and neutralize credit card fraud, helping credit card issuers avoid the financial losses associated with a data breach.

Article reposted with permission from Easy Solutions. Check the original piece.

Comments are closed.

Up ↑

%d bloggers like this: