Random Forests are supervised ensemble-learning models used for classification and regression. Ensemble learning models aggregate multiple machine learning models, allowing for overall better performance. The logic behind this is that each of the models used is weak when employed on its own, but strong when put together in an ensemble. In the case of Random... Continue Reading →
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Recently, we published a research paper showing how it is possible to detect fraudulent credit card transactions with a high level of accuracy and a low number of false positives. By using ensembles of cost-sensitive decision trees, we can save up to 73 percent of losses stemming from fraud. Here’s how. Classification, in the context... Continue Reading →
Machine Learning Algorithms Explained – Decision Trees
A Decision Tree is a supervised predictive model that can learn to predict discrete or continuous outputs by answering a set of simple questions based on the values of the input features it receives. To get a better understanding of how DT works, we will use a real-world dataset to better illustrate the concept. This... Continue Reading →
From Real-Time Learning to Reinforcement Learning with Asynchronous Feedback
Online, or real-time, transactional fraud detection systems have recently created quite the buzz in the info security industry. They are an appealing concept: Because we know that fraud patterns change over time, the ability to use machine-learning algorithms to automatically learn new patterns instantly allows us to have a stronger defense system. We often find... Continue Reading →
Building AI Applications Using Deep Learning
Recently, we have seen a huge boom around the field of deep learning; it is currently being implemented in a wide variety of fields, from driverless cars to product recommendation. In their most primitive form, deep learning algorithms originated in the 1960s. If the concept has been around for decades, why is it that widespread... Continue Reading →
Classifying Phishing URLs Using Recurrent Neural Networks
In a recent research paper, we showed how we are able to detect with a high level of accuracy if a website is a phish just by looking at the URL. This post lays out in greater detail how, by using a deep recurrent neural network, we’re able to accurately classify more than 98 percent... Continue Reading →
Machine Learning Explained
Machine learning models are often dismissed on the grounds of lack of interpretability. There is a popular story about modern algorithms that goes as follows: Simple linear statistical models such as logistic regression yield to interpretable models. On the other hand, advanced models such as random forest or deep neural networks are black boxes, meaning... Continue Reading →
Benefits of Anomaly Detection Using Isolation Forests
One of the newest techniques to detect anomalies is called Isolation Forests. The algorithm is based on the fact that anomalies are data points that are few and different. As a result of these properties, anomalies are susceptible to a mechanism called isolation. This method is highly useful and is fundamentally different from all existing... Continue Reading →
The Technical Side of Phishing and How to Prevent It
Phishing, by definition, is the act of defrauding an online user and tricking them into clicking on a malicious link in order to obtain personal information by posing as a trustworthy institution or entity. That is why users have a hard time differentiating between a legitimate and a malicious site. Although one might think the... Continue Reading →
Phishing Attack Analysis: Estimating Key Cluster Features and Why It’s Important
First, let’s quickly review the clusters we built to understand phishing attacks. Using data we collected over the course of a year spent tracking and taking down phishing cases for a major U.S. financial institution, we extracted features from four categories: similarity analysis, structure analysis, phishing visitors tracking and domain registration. Then, using the expectation-maximization... Continue Reading →
Fraud Detection That Accounts for Misclassification Using Cost-Sensitive Logistic Regression
Fraud detection is a cost-sensitive problem, in the sense that falsely flagging a transaction as fraudulent carriesa significantly different financial cost than missing an actual fraudulent transaction. In order to take these costs into account, companies should use a more business-oriented measure such as “Cost,” which allows companies to make decisions that are better aligned... Continue Reading →
Clustering of Phishing Attacks
In a recent report we showed how we are able to gain better understanding of phishing attacks and attackers by using cluster analysis. This post lays out in greater detail how to create those clusters by examining the features and methods used.For the study, we used the data collected over the course of more than a year... Continue Reading →