AB测试（新旧页面）

2023-08-16 06:03| 来源: 网络整理| 查看: 265

A/B测试案例数据是**对web新旧页面的A/B测试**结果，目标是判断新旧两版页面在用户的转化情况上是否有显著区别。

1.查看并处理数据

import pandas as pd import numpy as np import matplotlib.pyplot as plt import os os.chdir('C:\\Users\\laurel\\Desktop\\AB_测试\\ABTest') df = pd.read_csv('ab_data.csv')

使用df.info()查看关于数据的一些基本信息

df.info()

RangeIndex: 294478 entries, 0 to 294477 Data columns (total 5 columns): user_id 294478 non-null int64 timestamp 294478 non-null object group 294478 non-null object landing_page 294478 non-null object converted 294478 non-null int64 dtypes: int64(2), object(3) memory usage: 11.2+ MB 使用nunique()统计df各列不同值的个数

df.nunique()

得到 use_id ：290584，timestamp:294478, group:2, landing_page：2， converted：2 发现use_id有重复，列出重复的用户ID，如下：

df[df.user_id.duplicated(keep=False)].sort_values(by = 'user_id').head(15)

user_id timestamp group landing_page converted 630052 2017-01-17 01:16:05.208766 treatment new_page 0 630052 2017-01-07 12:25:54.089486 treatment old_page 1 630126 2017-01-14 13:35:54.778695 treatment old_page 0 630126 2017-01-19 17:16:00.280440 treatment new_page 0 630137 2017-01-20 02:08:49.893878 control old_page 0 630137 2017-01-22 14:59:22.051308 control new_page 0 630320 2017-01-07 18:02:43.626318 control old_page 0 630320 2017-01-12 05:27:37.181803 treatment old_page 0 630471 2017-01-07 02:14:17.405726 control new_page 0 630471 2017-01-23 01:42:51.501851 control old_page 0 630780 2017-01-09 10:14:27.657854 control old_page 0 630780 2017-01-18 21:27:53.859949 control new_page 0 630785 2017-01-22 03:17:33.509161 treatment new_page 1 630785 2017-01-09 16:18:59.516566 treatment old_page 0 630805 2017-01-12 20:45:39.012189 treatment old_page 0 观察发现，在上述表格中，存在分组group与展示页面版本landing_page不符的情况。

dismatch = ((df['group'] == 'treatment') != (df['landing_page'] == 'new_page')) #计算不匹配的总行数 dismatch.sum()) dismatch.sum()/len(df) 0.013220002852505111

统计得到有3893行数据group和landing_page不匹配,占总数据的1.3%。我们可以将起删掉

match_df = df[~mismatch].copy()

在VS code的控制台调试、查看行数与独立用户数

match_df.shape[0] 290585 match_df.user_id.nunique() 290584

仍旧有重复的用户id，在VS code查看：

match_df[match_df.user_id.duplicated(keep=False)] user_id timestamp group landing_page converted 1899 773192 2017-01-09 05:37:58.781806 treatment new_page 0 2893 773192 2017-01-14 02:55:59.590927 treatment new_page 0

我们只需保留一个即可，删除重复值：

match_df = match_df.drop_duplicates(subset=['user_id'], keep='last')

在VS code 调试控制台查看缺失值

match_df.isnull().sum() user_id 0 timestamp 0 group 0 landing_page 0 converted 0 dtype: int64

在VS code 调试控制台查看缺失值的比例（针对有缺失值，此处该步骤可以省掉）

match_df.apply(lambda x:sum(x.isnull())/len(x)) user_id 0.0 timestamp 0.0 group 0.0 landing_page 0.0 converted 0.0 dtype: float64

在VS code 调试控制台查看新页面、旧页面的用户占比多少

match_df[match_df.landing_page=="new_page"].shape[0]/match_df.shape[0] 0.5000619442226688 match_df[match_df.landing_page=="old_page"].shape[0]/match_df.shape[0] 0.4999380557773312

2.进入假设检验进入假设检验阶段------判断新旧页面的转化情况是否有显著差异，假设为新页面的转化情况优于旧页面。用转化率衡量转化情况。零假设：旧页面的转化率为 >新页面的转化率为

# 旧版、新版用户数 n_old = match_df.query('group=="control"').shape[0] n_new = match_df.query('group=="treatment"').shape[0] # 旧版、新版转化用户数 convert_old = match_df.query('group=="control" & converted==1').shape[0] convert_new = match_df.query('group=="treatment" & converted==1').shape[0] n_old 145274 n_old 145274 convert_old 17489 convert_new 17872

采用python的statsmodels包计算检验统计量z与p值

import statsmodels.stats.proportion as sp # alternative='smaller'代表左尾 z_score, p_value = sp.proportions_ztest([convert_old, convert_new], [n_old, n_new], alternative='smaller') print('检验统计量z:', z_score, '，p值:', p_value)

检验统计量z: -2.1484056695589 ，p值: 0.015840771394875417 这里p值约等于0.016，p

【本文地址】

AB测试（新旧页面）

AB测试（新旧页面）

今日新闻

推荐新闻