Stata：如何估计置信区间？

您所在的位置：网站首页 › stataols回归系数的含义 › Stata：如何估计置信区间？

Stata：如何估计置信区间？

2023-11-03 17:47| 来源: 网络整理| 查看: 265

4. Stata 范例

我们使用 Stata 自带的数据 auto 来分别演示不同的置信区间的获取过程。

首先，导入数据并显示数据基本格式：

. sysuse auto, clear . des Observations: 74 1978 automobile data Variables: 12 13 Apr 2020 17:45 (_dta has notes) --------------------------------------------------------- Variable Storage Display name type format Variable label --------------------------------------------------------- make str18 %-18s Make and model price int %8.0gc Price mpg int %8.0g Mileage (mpg) rep78 int %8.0g Repair record 1978 headroom float %6.1f Headroom (in.) trunk int %8.0g Trunk space (cu. ft.) weight int %8.0gc Weight (lbs.) length int %8.0g Length (in.) turn int %8.0g Turn circle (ft.) displacement int %8.0g Displacement (cu. in.) gear_ratio float %6.2f Gear ratio foreign byte %8.0g Car origin ---------------------------------------------------------- Sorted by: foreign复制代码

然后，分别以变量 price 和 foreign 为例，计算它们各自的均值在 95% 置信水平的均值置信区间。前者为连续变量，后者为分类变量。

. ci means price, level(95) Variable | Obs Mean Std. err. [95% conf. interval] ----------+---------------------------------------------------- price | 74 6165.257 342.8719 5481.914 6848.6 . ci proportions foreign Binomial exact Variable | Obs Proportion Std. err. [95% conf. interval] ----------+---------------------------------------------------- foreign | 74 .2972973 .0531331 .196584 .4148353复制代码

接着，我们以 price 为因变量，以 weight， length 和 foreign 为自变量，演示回归系数的在 95% 置信水平的置信区间的获取过程。

. reg price weight length foreign, level(95) Source | SS df MS Number of obs = 74 ---------+---------------------------------- F(3, 70) = 28.39 Model | 348565467 3 116188489 Prob > F = 0.0000 Residual | 286499930 70 4092856.14 R-squared = 0.5489 ---------+---------------------------------- Adj R-squared = 0.5295 Total | 635065396 73 8699525.97 Root MSE = 2023.1 -------------------------------------------------------------------------- price | Coefficient Std. err. t P>|t| [95% conf. interval] ---------+---------------------------------------------------------------- weight | 5.775 0.959 6.02 0.000 3.861 7.688 length | -91.371 32.828 -2.78 0.007 -156.845 -25.897 foreign | 3573.092 639.328 5.59 0.000 2297.992 4848.191 _cons | 4838.021 3742.010 1.29 0.200 -2625.183 12301.224 --------------------------------------------------------------------------复制代码

最后，我们演示如何使用 bootstrap 计算回归系数的置信区间。

. bootstrap, reps(100): reg price weight length foreign //默认reps(50) (running regress on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 Linear regression Number of obs = 74 Replications = 100 Wald chi2(3) = 51.68 Prob > chi2 = 0.0000 R-squared = 0.5489 Adj R-squared = 0.5295 Root MSE = 2023.0809 -------------------------------------------------------------------------- | Observed Bootstrap Normal-based price | coefficient std. err. z P>|z| [95% conf. interval] ---------+---------------------------------------------------------------- weight | 5.775 1.617 3.57 0.000 2.605 8.944 length | -91.371 54.331 -1.68 0.093 -197.858 15.116 foreign | 3573.092 695.808 5.14 0.000 2209.333 4936.851 _cons | 4838.021 5871.232 0.82 0.410 -6669.382 16345.423 -------------------------------------------------------------------------- . *或者 . reg price weight length foreign, vce(bs, reps(100)) //默认reps(50) (running regress on estimation sample) Bootstrap replications (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 Linear regression Number of obs = 74 Replications = 100 Wald chi2(3) = 59.10 Prob > chi2 = 0.0000 R-squared = 0.5489 Adj R-squared = 0.5295 Root MSE = 2023.0809 -------------------------------------------------------------------------- | Observed Bootstrap Normal-based price | coefficient std. err. z P>|z| [95% conf. interval] ---------+---------------------------------------------------------------- weight | 5.775 1.606 3.59 0.000 2.626 8.923 length | -91.371 51.877 -1.76 0.078 -193.047 10.306 foreign | 3573.092 640.629 5.58 0.000 2317.481 4828.703 _cons | 4838.021 5365.122 0.90 0.367 -5677.426 15353.467 --------------------------------------------------------------------------复制代码

通过对比可以发现，两种方法计算的标准误和置信区间有一定的差别。根据 bootstrap 帮助文档介绍，许多估计命令都允许 vce(bootstrap)，因此我们建议在估计参数的时候使用 vce(bootstrap)，而非 bootstrap 前缀命令。这主要是因为，估计命令已经帮我们处理好了聚类和其他一些模型细节，而 bootstrap 前缀命令只适用于那些非估计命令，例如 summarize 和用户自定义程序等。

5. 结束语

概率论与数理统计学的基本原理是正确运用不同研究方法和解读研究结果的基础。尽管我们可以借助多种统计软件如 Stata、R、SPSS 等直接获得想要的数据结果，而不需要通过掌握许多数理知识来手工计算，但是当我们想要比较多种方法的优劣，思考代码或公式是否正确时，好的数理基础可以减少误入歧途的可能性。在参考资料部分，本文列出的基本概率论与统计学书籍可以供对此感兴趣的同学进一步阅读。

6. 参考资料What exactly is the confidence interval?何晓群.现代统计分析方法与应用[M].第 3 版.北京：中国人民大学出版社，2012.陈希孺．概争论与数理统计．[M].合肥：中国科学技术大学出版社，1992．盛骤，谢式千，潘承毅．概概率论与数理统计．[M].第 3 版．北京：高等教育出板社，2006．

【本文地址】

Stata：如何估计置信区间？

Stata：如何估计置信区间？

今日新闻

推荐新闻