AB测试(新旧页面) |
您所在的位置:网站首页 › ab页面是什么 › AB测试(新旧页面) |
A/B测试
案例数据是**对web新旧页面的A/B测试**结果,目标是判断新旧两版页面在用户的转化情况上是否有显著区别。
1.查看并处理数据 import pandas as pd import numpy as np import matplotlib.pyplot as plt import os os.chdir('C:\\Users\\laurel\\Desktop\\AB_测试\\ABTest') df = pd.read_csv('ab_data.csv')使用df.info()查看关于数据的一些基本信息 df.info()RangeIndex: 294478 entries, 0 to 294477 Data columns (total 5 columns): user_id 294478 non-null int64 timestamp 294478 non-null object group 294478 non-null object landing_page 294478 non-null object converted 294478 non-null int64 dtypes: int64(2), object(3) memory usage: 11.2+ MB 使用nunique()统计df各列不同值的个数 df.nunique()得到 use_id :290584,timestamp:294478, group:2, landing_page:2, converted:2 发现use_id有重复,列出重复的用户ID,如下: df[df.user_id.duplicated(keep=False)].sort_values(by = 'user_id').head(15)user_id timestamp group landing_page converted 630052 2017-01-17 01:16:05.208766 treatment new_page 0 630052 2017-01-07 12:25:54.089486 treatment old_page 1 630126 2017-01-14 13:35:54.778695 treatment old_page 0 630126 2017-01-19 17:16:00.280440 treatment new_page 0 630137 2017-01-20 02:08:49.893878 control old_page 0 630137 2017-01-22 14:59:22.051308 control new_page 0 630320 2017-01-07 18:02:43.626318 control old_page 0 630320 2017-01-12 05:27:37.181803 treatment old_page 0 630471 2017-01-07 02:14:17.405726 control new_page 0 630471 2017-01-23 01:42:51.501851 control old_page 0 630780 2017-01-09 10:14:27.657854 control old_page 0 630780 2017-01-18 21:27:53.859949 control new_page 0 630785 2017-01-22 03:17:33.509161 treatment new_page 1 630785 2017-01-09 16:18:59.516566 treatment old_page 0 630805 2017-01-12 20:45:39.012189 treatment old_page 0 观察发现,在上述表格中,存在分组group与展示页面版本landing_page不符的情况。 dismatch = ((df['group'] == 'treatment') != (df['landing_page'] == 'new_page')) #计算不匹配的总行数 dismatch.sum()) dismatch.sum()/len(df) 0.013220002852505111统计得到有3893行数据group和landing_page不匹配,占总数据的1.3%。我们可以将起删掉 match_df = df[~mismatch].copy()在VS code的控制台调试、查看行数与独立用户数 match_df.shape[0] 290585 match_df.user_id.nunique() 290584仍旧有重复的用户id,在VS code查看: match_df[match_df.user_id.duplicated(keep=False)] user_id timestamp group landing_page converted 1899 773192 2017-01-09 05:37:58.781806 treatment new_page 0 2893 773192 2017-01-14 02:55:59.590927 treatment new_page 0我们只需保留一个即可,删除重复值: match_df = match_df.drop_duplicates(subset=['user_id'], keep='last')在VS code 调试控制台查看缺失值 match_df.isnull().sum() user_id 0 timestamp 0 group 0 landing_page 0 converted 0 dtype: int64在VS code 调试控制台查看缺失值的比例(针对有缺失值,此处该步骤可以省掉) match_df.apply(lambda x:sum(x.isnull())/len(x)) user_id 0.0 timestamp 0.0 group 0.0 landing_page 0.0 converted 0.0 dtype: float64在VS code 调试控制台查看新页面、旧页面的用户占比多少 match_df[match_df.landing_page=="new_page"].shape[0]/match_df.shape[0] 0.5000619442226688 match_df[match_df.landing_page=="old_page"].shape[0]/match_df.shape[0] 0.49993805577733122.进入假设检验 进入假设检验阶段------判断新旧页面的转化情况是否有显著差异,假设为新页面的转化情况优于旧页面。用转化率衡量转化情况。 零假设:旧页面的转化率为 >新页面的转化率为 # 旧版、新版用户数 n_old = match_df.query('group=="control"').shape[0] n_new = match_df.query('group=="treatment"').shape[0] # 旧版、新版转化用户数 convert_old = match_df.query('group=="control" & converted==1').shape[0] convert_new = match_df.query('group=="treatment" & converted==1').shape[0] n_old 145274 n_old 145274 convert_old 17489 convert_new 17872采用python的statsmodels包计算检验统计量z与p值 import statsmodels.stats.proportion as sp # alternative='smaller'代表左尾 z_score, p_value = sp.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller') print('检验统计量z:', z_score, ',p值:', p_value)检验统计量z: -2.1484056695589 ,p值: 0.015840771394875417 这里p值约等于0.016,p |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |