细菌群落结构分析:菌群α多样性 香农指数、辛普森Chao1、Rarefaction多样性

您所在的位置:网站首页 香农多样性指数和辛普森多样性指数 细菌群落结构分析:菌群α多样性 香农指数、辛普森Chao1、Rarefaction多样性

细菌群落结构分析:菌群α多样性 香农指数、辛普森Chao1、Rarefaction多样性

2024-07-17 21:45| 来源: 网络整理| 查看: 265

这个脚本调用如下的步骤: Generate rarefied OTU tables; compute alpha diversity metrics for each rarefied OTU table; collate alpha diversity results; and generate alpha rarefaction plots.


-i,    输入biom文件

-m,   mapping文件

-o,   输出文件夹


-p, 参数文件,指定求解哪些东西

-n, --num_steps

Number of steps (or rarefied OTU table sizes) to make between min and max counts [default: 10]

-f,  强行覆盖同名的文件夹

-w,  提示有哪些程序,但不适用他们(用于排错)

-a,  平行运行

-t,   进化树文件


The lower limit of rarefaction depths [default: 10]

-e, --max_rare_depth

The upper limit of rarefaction depths [default: median sequence/sample count]

-O, --jobs_to_start

Number of jobs to start. NOTE: you must also pass -a to run in parallel, this defines the number of jobs to be started if and only if -a is passed [default: 2]


Retain intermediate files: rarefied OTU tables (rarefaction) and alpha diversity results (alpha_div). By default these will be erased [default: False]





echo "alpha_diversity:metricsshannon,PD_whole_tree,chao1,observed_species,goods_coverage,simpson" > alpha_params.txt

(2)接着运行脚本(it may need several hours):

alpha_rarefaction.py -i otu_table/otu_table.biom -m map.txt -o div_alpha/ -p alpha_params.txt -t rep_phylo.tre






python /usr/lib/qiime/bin//multiple_rarefactions.py -i otu_table/otu_table.biom -m 10 -x 16544 -s1653 -o div_alpha//rarefaction/



# Alpha diversity on rarefied OTU tables command

python /usr/lib/qiime/bin//alpha_diversity.py -i div_alpha//rarefaction/ -o div_alpha//alpha_div/ --metrics shannon,PD_whole_tree,chao1,observed_species,goods_coverage,simpson -t rep_phylo.tre


sam@sam-Precision-WorkStation-T7500[mtt3] alpha_diversity.py -s                           

Known metrics are: ACE, berger_parker_d, brillouin_d, chao1, chao1_confidence, dominance, doubles, equitability, esty_ci, fisher_alpha, gini_index, goods_coverage, heip_e, kempton_taylor_q, margalef, mcintosh_d, mcintosh_e, menhinick, michaelis_menten_fit, observed_species, osd, simpson_reciprocal, robbins, shannon, simpson, simpson_e, singles, strong, PD_whole_tree




# Collate alpha command

python /usr/lib/qiime/bin//collate_alpha.py -i div_alpha//alpha_div/ -o div_alpha//alpha_div_collated/



# Rarefaction plot: All metrics command

python /usr/lib/qiime/bin//make_rarefaction_plots.py -i div_alpha//alpha_div_collated/ -m map.txt -o div_alpha//alpha_rarefaction_plots/




shannon, 菌群多样性指数





Dominance 随即取两条序列,来自同一个样品的概率Σ (Si(Si-1))/N(N-1)


simpson  菌群多样性指数








谱系alpha多样性(phylogenetic diversity,Faith 1992):探讨进化历史的保存,应用于种群,群落,生物地理学,保护生物学。

谱系beta多样性(phylobetadiversity,Webb 2002):探讨群落或的确的谱系距离及其成因。

谱系信号与谱系结构(phylogenetic signal and phylogenetic structure):探讨群落和地区物种共存机制。

谱系多样性(phylogenetic diversity PD,某个地点所有物种间最短进化分支长度之和占各节点分支长度综合的比例(Faith,1992)

群落谱系距离(phylogenetic distance:群落I与群落II中种俩俩之间谱系分支长度之和的平均值(Webb,2002)

PD_whole_tree:sum of branch lengths between all representatives ????


chao1, 菌种丰富度指数。估计群落中的OTU数目






goods_coverage  测序深度指数




multiple_rarefactions.py注解  http://qiime.org/scripts/multiple_rarefactions.html

alpha_diversity.py注解  http://qiime.org/scripts/alpha_diversity.html

collate_alpha.py 注解   http://qiime.org/scripts/collate_alpha.html

make_rarefaction_plots.py 注解  http://qiime.org/scripts/make_rarefaction_plots.html




Diversity index A diversity index is a quantitative measure that reflects how many different types (such as species) there are in a dataset, and simultaneously takes into account how evenly the basic entities (such as individuals) are distributed among those types. The value of a diversity index increases both when the number of types increases and when evenness increases. For a given number of types, the value of a diversity index is maximized when all types are equally abundant. When diversity indices are used in ecology, the types of interest are usually species, but they can also be other categories, such as genera, families, functional types or haplotypes. The entities of interest are usually individual plants or animals, and the measure of abundance can be, for example, number of individuals, biomass or coverage. In demography, the entities of interest can be people, and the types of interest various demographic groups. In information science, the entities can be characters and the types the different letters of the alphabet. The most commonly used diversity indices are simple transformations of the effective number of types (also known as 'true diversity'), but each diversity index can also be interpreted in its own right as a measure corresponding to some real phenomenon (but a different one for each diversity index). Shannon index The Shannon index has been a popular diversity index in the ecological literature, where it is also known as Shannon's diversity index, the Shannon–Wiener index,[citation needed] the Shannon–Weaver index and the Shannon entropy. The measure was originally proposed by Claude Shannon to quantify the entropy (uncertainty or information content) in strings of text.The idea is that the more different letters there are, and the more equal their proportional abundances in the string of interest, the more difficult it is to correctly predict which letter will be the next one in the string. The Shannon entropy quantifies the uncertainty (entropy or degree of surprise) associated with this prediction.  Simpson index The Simpson index was introduced in 1949 by Edward H. Simpson to measure the degree of concentration when individuals are classified into types. The same index was rediscovered by Orris C. Herfindahl in 1950.The square root of the index had already been introduced in 1945 by the economist Albert O. Hirschman.[8] As a result, the same measure is usually known as the Simpson index in ecology, and as the Herfindahl index or the Herfindahl–Hirschman index (HHI) in economics. The measure equals the probability that two entities taken at random from the dataset of interest represent the same type.

更直观的反应微生物的多样性,还需要利用香农-威纳指数(Shannon-Wiener Index)和辛普森多样性指数(Simpson's diversity Index)来表示。


Shannon-Wiener Index   费歇尔和普雷斯顿的方法所表示的多样性指数仅包括种的多寡一方面。香农-威纳指数和辛普森指数则包括了测量群落的异质性。香农-威纳指数借用了信息论方法。信息论的主要测量对象是系统的序( order)或无序(disorder)的含量。在通讯工程中,人们要进行预测,预测信息中下一个是什么字母,其不定性的程度有多大。例如,b b b b b b b这样的信息流,都属于同一个字母,要预测下一个字母是什么,没有任何不定性,其信息的不定性含量等于零。如果是a,b,c,d,e,f,g,每个字母都不相同。那么其信息的不定性含量就大。在群落多样性的测度上,就借用了这个信息论中不定性测量方法,就是预测下一个采集的个体属于什么种,如果群落的多样性程度越高,其不定性也就越大。 香农-威纳指数的公式是:H=-∑(Pi)(log2Pi) 其中,H=样品的信息含量(彼得/个体)=群落的多样性指数,S=种数,Pi=样品中属于第i种的个体的比例,如样品总个体数为N,第i种个体数为ni,则Pi=ni/N     下面用一个假设的简单数字为例,说明香农一威纳指数的含义,设有 A,B,C三个群落,各有两个种所组成,其中各种个体数组成如下:     物种甲物种乙     群落A 100(1.0) 0(0)     群落B 50(0.5) 50(0.5)     群落C 99(0.99) 1(0.01)     括号内数字即 Pi因为群落A的所有个体均属于物种甲,没有任何不定性,从理论上说H应该等于零,其香农一威纳指数是:     H=-〔(1.0 log21.0)+ 0)〕=0     由于在群落B中两个物种各有50个体,其分布是均匀的。它的香农指数是:     H=-〔0.50(log20.50)+0.50(log20.50)〕=l     群落C的两个物种分别具有99和1个个体,则:     H=一〔0.99(log20.99)+ 0.01(log20.01)〕=0.081     显然,H值的大小与我们的直觉是相符的:群落B的多样性较群落C大,而群落A的多样性等于零。 在香农-威纳指数中,包含着两个成分:①种数;②各种间个体分配的均匀性(equiability或evenness)。各种之间,个体分配越均匀,H值就越大。如果每一个体都属于不同的种,多样性指数就最大;如果每一个体都属于同一种,则其多样性指数就最小。那么,均匀性指数如何来测定呢?可以通过估计群落的理论上的最大多样性指数(Hmax),然后以实际的多样性指数对Hmax的比率,从而获得均匀性指数,具体步骤如下: Hmax=-S(1/S log21/S)=log2S,其中 Hmax=在最大均匀性条件下的种多样性值,S=群落中种数     如果有S个种,在最大均匀性条件下,即每个种有1/S个体比例,、所以在此条件下Pi=1/S,举例说,群落中只有两个种时,则:Hmax=log22=1     这与前面的计算是一致的,因此,我们可以犯均匀性指数定义为:E=H/ Hmax,其中 E=均匀性指数,H=实测多样性值,Hmax =最大多样性值= log2S   Simpson's diversity Index   辛普森在1949年提出过这样的问题:在无限大小的群落中,随机取样得到同样的两个标本,它们的概率是什么呢?如在加拿大北部森林中,随机采取两株树标本,属同一个种的概率就很高。相反,如在热带雨林随机取样,两株树同一种的概率很低,他从这个想法出发得出多样性指数。用公式表示为:     辛普森多样性指数=随机取样的两个个体属于不同种的概率     =1-随机取样的两个个体属于同种的概率     设种i的个体数占群落中总个体数的比例为Pi,那么,随机取种i两个个体的联合概率就为。如果我们将群落中全部种的概率合起来,就可得到辛普森指数D,即 香农-威纳指数和辛普森多样性指数S为物种数目。 辛普森多样性指数的最低值是0; 最高值为Dmax: 香农-威纳指数和辛普森多样性指数 前一种情况出现在全部个体均属于一个种的时候,后一种情况出现在每个个体分别属于不同种的时候。     例如,甲群落中A、B两个种的个体数分别为99和1,而乙群落中A、B两个种的个体数均为50,按辛普森多样性指数计算, 甲群落的辛普森指数:D甲=1-(0.992+0.012)=0.0198 甲群落的辛普森指数:D乙=1-(0.52+0.52)=0.5 乙群落的多样性高于甲群落。造成这两个群落多样性差异的主要原因是种的不均匀性,从丰富度来看,两个群落是一样的,但均匀度不同。




CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3