By Alejandro Correa and Maria Fernanda Cortes
This post is part of a series in which I’m discussing several parts of my AI_at_Rappi presentation. In a previous post I discussed a particular algorithm for recommending restaurants called rest2vec, In a follow-up, I discussed how to include financial costs when analyzing a churn model.
This time I got the pleasure to write this post with one of our Data Scientists, Maria Fernanda Cortes, we’re going to be discussing how to improve your marketing campaigns using uplift models.
Companies use different marketing actions such as monetary incentives, discounts, cashbacks among other types of offers. The business decision for a company is, therefore, to select which users to contact with these strategies. To make this decision, we must take into account certain key points. First, the mentioned strategies have a cost for the company and we need to wisely select the target users in order to reduce campaign costs. Second, we must generate an incremental impact with the campaigns, that is, additional sales and revenue that would not have come without them. Finally, we do not want to unnecessarily communicate campaigns to users who could be bothered with them.
The conventional response approach
The conventional approach to select the customers for marketing campaigns is predicting which are the “good” and the “bad” clients. This approach consists in carrying out a pilot marketing campaign for a sample of users and, with the results thereof, building a model to predict the purchase probability given that the user was contacted by the campaign. Then, the users with a higher purchase probability are those selected as a target audience. This methodology is summarized in Figure 1.
However, this methodology has some weaknesses. First of all, it leads to unnecessary costs as we may be targeting users who could have bought without being contacted. For instance, let’s assume that the red users in Figure 2 are those with a higher probability of buying with the campaign, according to the conventional model. Some of these users may be so likely to buy, that they would do so even if they are not contacted by any campaign. Therefore, money is spent in campaigns for users who do not need them to make their purchases.
Moreover, this approach is designed to maximize the response rate, but not the incremental effect or uplift. Consequently, the campaign’s response rate may be high, but the incremental improvement gained by it is low, and it may not be worth the cost invested in it. In addition, with this approach, we may be targeting users, who were very likely to buy, but bothering them by contacting them unnecessarily.
Let’s explain the danger of this approach with a different example. Let’s assume that we are identifying in which states to spend the budget of a democratic electoral campaign in the United States. According to the conventional response approach, we should calculate the probability that a person will vote democrat given that we communicate the campaign to them. This probability could be around 99% in California, but on the other hand, it could be around 1% in Texas.
With the conventional approach, the suggested target of users would be California’s population. However, what we should really look for a state where the incentive, in this case, the electoral campaign, can generate a change in the voting intention.
This is the main reason why we do not use this methodology but instead use the Uplift Modeling methodology.
Uplift modeling is a methodology that predicts the influence of a marketing treatment on a customer’s purchase behavior. It is also known as incremental-lift or true-lift modeling. The objective is to target users who are likely to buy when given a treatment but are unlikely to buy otherwise. Thus, for this methodology, we need to predict the buying behavior of customers when they are given the treatment, and when they are not.
As illustrated in Figure 4, the users who buy when contacted but do not do so when not contacted are classified as “persuadable”. Uplift modeling aims to target these users as the incentive has a clear impact on their buying behavior; therefore, they are worthy of expending the cost of the marketing campaign. On the contrary, users classified as “safe” are not contacted because, although they are likely to purchase with the treatment, they will also buy without it and we would be worthlessly spending the budget. Moreover, uplift modeling seeks to avoid the cost of contacting “lost” users, who are those who will not purchase, whether contacted or not. Finally, this methodology avoids bothering users classified as “do not disturb”, who have a higher probability of purchasing without marketing treatments.
Comparison between uplift modeling and conventional response models
We are going to compare the conventional and uplift methodologies by analyzing what would happen if we divided our target of users into two groups: an exposed group that receives the treatment and a control group to which we do not apply the treatment for measurement purposes. With the conventional methodology, the response rate of the exposed group will be high; however, this rate will be similar to that of the control group, as the target includes users with high purchase probability, even without treatments. Hence, the campaign’s incrementality, which is calculated by subtracting the lighter bars from the darker bars in Figure 5, will be low.
On the other hand, with the uplift model, the difference between the response rate of the exposed and control groups will be high and positive, as shown in Figure 6. This difference corresponds to the campaign’s incrementality and does show that marketing contact results in more purchases.
Summarising, the main disadvantage of the conventional approach consists in being designed to maximize the response rate of the campaign, instead of the incremental effect, as the uplift modeling does.
Uplift Modeling – Methodology
We will now explain in more depth the uplift modeling methodology. We have a record of the purchase history of the users, with which we will build a profile that we will call X from now on. With the user’s profile information, we build two machine learning models, one to predict the purchase probability given the incentive, and another to predict the purchase probability without incentives, as shown in Figure 7. The next stage is to combine the results of both models to determine which users to direct the marketing campaign.
Specifically, for applying this methodology in Rappi, we used different variables to define the user profile X. We included variables related to the previous orders of each user, demographics, and variables proceeding from the user’s interaction with the application. Figure 8 shows an example of how the dataset records look like. Each row includes the user id, the recency or number of days elapsed since the user’s last order, the user profile variables mentioned above, an indicator of whether the user received or not a marketing incentive during that day and, finally, the response variable that specifies if the user purchased or not during the day.
The red registers, where no incentives are given, are used to train the no incentive model, while the gray registers with applied incentives are used to train the incentive model.
When we have built the two datasets, we can proceed to build the incentive and no-incentive models. Specifically, both models we built are Extreme Gradient Boosting models for producing better results in terms of ROC area under the curve, and for the sake of computation efficiency. Nevertheless, a deep explanation of the Extreme Gradient Boosting algorithm is beyond the scope of this blog.
After building both Machine Learning models, the next stage is to combine their results in order to select the target users for the treatment. To do so, we grouped the outputs of the models in percentiles. Accordingly, the users in the percentile 90 for the incentive model are the ones with the highest probability of buying with the treatment and those in the percentile 0 are those with the lowest probability. We did the same subdivision of the users in percentiles for the no-incentive model. Thereby, the red users located at the bottom-left of Figure 9 are the most persuadable ones; as they have a higher purchase probability with the treatment and a lower purchase probability without it. Thus, they are given contact priority. Notably, the color-coding for priority levels goes from gray, for users with a low contact priority due to their low incremental impact, up to dark red for users for those with higher uplift.
Conclusions and remarks
To sum up, when deciding which users to apply a treatment, offer, communication, or campaign, it is not enough to predict their probability of purchasing. For generating a maximal incremental impact, we should also avoid contacting users who are so prone to purchasing that will do so even without any treatment, as the uplift modeling methodology suggests. This methodology provides an opportunity to reduce the costs of unnecessary contacts while aiming to generate a maximal incremental revenue.