# 第八部分: Uplift模型

• Treatment Responders：只有收到优惠才会购买的客户
• Treatment Non-Responders：无论怎样都不会购买的客户
• Control Responders：不需要优惠就会购买的客户
• Control Non-Responders：没有收到优惠就不会购买的客户

1. 预测所有客户在每一组中的购买概率：我们将为此建立一个多分类模型。
2. 我们会计算uplift分数，uplift分数的公式为：

``````from datetime import datetime, timedelta,date
import pandas as pd
%matplotlib inline
from sklearn.metrics import classification_report,confusion_matrix
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from __future__ import division
from sklearn.cluster import KMeans

import plotly.plotly as py
import plotly.offline as pyoff
import plotly.graph_objs as go

import sklearn
import xgboost as xgb
from sklearn.model_selection import KFold, cross_val_score, train_test_split
import warnings
warnings.filterwarnings("ignore")

#initiate plotly
pyoff.init_notebook_mode()

#function to order clusters
def order_cluster(cluster_field_name, target_field_name,df,ascending):
new_cluster_field_name = 'new_' + cluster_field_name
df_new = df.groupby(cluster_field_name)[target_field_name].mean().reset_index()
df_new = df_new.sort_values(by=target_field_name,ascending=ascending).reset_index(drop=True)
df_new['index'] = df_new.index
df_final = pd.merge(df,df_new[[cluster_field_name,'index']], on=cluster_field_name)
df_final = df_final.drop([cluster_field_name],axis=1)
df_final = df_final.rename(columns={"index":cluster_field_name})
return df_final

#function for calculating the uplift
def calc_uplift(df):
avg_order_value = 25

#calculate conversions for each offer type
base_conv = df[df.offer == 'No Offer']['conversion'].mean()
disc_conv = df[df.offer == 'Discount']['conversion'].mean()
bogo_conv = df[df.offer == 'Buy One Get One']['conversion'].mean()

#calculate conversion uplift for discount and bogo
disc_conv_uplift = disc_conv - base_conv
bogo_conv_uplift = bogo_conv - base_conv

#calculate order uplift
disc_order_uplift = disc_conv_uplift * len(df[df.offer == 'Discount']['conversion'])
bogo_order_uplift = bogo_conv_uplift * len(df[df.offer == 'Buy One Get One']['conversion'])

#calculate revenue uplift
disc_rev_uplift = disc_order_uplift * avg_order_value
bogo_rev_uplift = bogo_order_uplift * avg_order_value

print('Discount Conversion Uplift: {0}%'.format(np.round(disc_conv_uplift*100,2)))
print('Discount Order Uplift: {0}'.format(np.round(disc_order_uplift,2)))
``````

``````df_data = pd.read_csv('response_data.csv')
``````

• recency: 上次购买距离现在的月数
• history: 历史购买的金额
• used_discount/used_bogo: 表示用户是否使用了折扣或者买一送一
• zip_code: 邮编的类型有农村/郊区/城市
• is_referral: 表示用户是否通过referral获得
• channel: 客户使用的渠道，电话/网站/多通道
• offer: 发给用户的优惠，打折/买一送一/无优惠

``````calc_uplift(df_data)
``````

## 多分类模型来预测Uplift得分

``````df_data['campaign_group'] = 'treatment'
df_data.loc[df_data.offer == 'No Offer', 'campaign_group'] = 'control'
``````

``````df_data['target_class'] = 0 #CN
df_data.loc[(df_data.campaign_group == 'control') & (df_data.conversion > 0),'target_class'] = 1 #CR
df_data.loc[(df_data.campaign_group == 'treatment') & (df_data.conversion == 0),'target_class'] = 2 #TN
df_data.loc[(df_data.campaign_group == 'treatment') & (df_data.conversion > 0),'target_class'] = 3 #TR
``````

• 0 -> Control Non-Responders
• 1 -> Control Responders
• 2 -> Treatment Non-Responders
• 3 -> Treatment Responders

``````#creating the clusters
kmeans = KMeans(n_clusters=5)
kmeans.fit(df_data[['history']])
df_data['history_cluster'] = kmeans.predict(df_data[['history']])#order the clusters
df_data = order_cluster('history_cluster', 'history',df_data,True)#creating a new dataframe as model and dropping columns that defines the label
df_model = df_data.drop(['offer','campaign_group','conversion'],axis=1)#convert categorical columns
df_model = pd.get_dummies(df_model)
``````

``````#creating the clusters
kmeans = KMeans(n_clusters=5)
kmeans.fit(df_data[['history']])
df_data['history_cluster'] = kmeans.predict(df_data[['history']])

#order the clusters
df_data = order_cluster('history_cluster', 'history',df_data,True)

#creating a new dataframe as model and dropping columns that defines the label
df_model = df_data.drop(['offer','campaign_group','conversion'],axis=1)

#convert categorical columns
df_model = pd.get_dummies(df_model)
``````

• CN: 32%
• CR: 2%
• TN: 58.9%
• TR: 6.9%

0.32 + 0.069- 0.02- 0.589 = -0.22

``````#create feature set and labels
X = df_model.drop(['target_class'],axis=1)
y = df_model.target_class
#splitting train and test groups
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=56)
#fitting the model and predicting the probabilities
xgb_model = xgb.XGBClassifier().fit(X_train, y_train)
class_probs = xgb_model.predict_proba(X_test)
``````

## 模型评估

1. 高Uplift分数：客户的uplift分数>3/4分位数
2. 低Uplift分数：客户的uplift分数<1/2分位数

• 转化的uplift
• 每个目标用户的收入uplift，看看我们的模型是不是让我们的活动更有效了

``````Total Targeted Customer Count: 21307
Discount Conversion Uplift: 7.66%
Discount Order Uplift: 1631.89
Discount Revenue Uplift: \$40797.35
Revenue Uplift Per Targeted Customer: \$1.91
``````

``````df_data_lift = df_data.copy()
uplift_q_75 = df_data_lift.uplift_score.quantile(0.75)
df_data_lift = df_data_lift[(df_data_lift.offer != 'Buy One Get One') & (df_data_lift.uplift_score > uplift_q_75)].reset_index(drop=True)
#calculate the uplift
calc_uplift(df_data_lift)

results:
User Count: 5282
Discount Conversion Uplift: 12.18%
Discount Order Uplift: 643.57
Discount Revenue Uplift: \$16089.36
Revenue Uplift Per Targeted Customer: \$3.04
``````

``````df_data_lift = df_data.copy()
uplift_q_5 = df_data_lift.uplift_score.quantile(0.5)
df_data_lift = df_data_lift[(df_data_lift.offer != 'Buy One Get One') & (df_data_lift.uplift_score < uplift_q_5)].reset_index(drop=True)
#calculate the uplift
calc_uplift(df_data_lift)

results:
User Count: 10745
Discount Conversion Uplift: 5.63%
Discount Order Uplift: 604.62
Discount Revenue Uplift: \$15115.52
Revenue Uplift Per Targeted Customer: \$1.4
``````

• 根据uplift得分，针对特定的人群进行活动
• 根据uplift得分，尝试不同的优惠方式

