ggtree: 系统发育树(phylogenetic tree)可视化

您所在的位置:网站首页 进化树上的数值代表什么 ggtree: 系统发育树(phylogenetic tree)可视化

ggtree: 系统发育树(phylogenetic tree)可视化

#ggtree: 系统发育树(phylogenetic tree)可视化 | 来源: 网络整理| 查看: 265

You can’t even begin to understand biology, you can’t understand life, unless you understand what it’s all there for, how it arose - and that means evolution. — Richard Dawkins

01. 输入文件格式

常见的系统发生树的格式主要有三种:Newick、NEXUS以及Phylip。其中,Newick和NEXUS格式的系统发生树能够被大多数软件所识别。除此之外,许多进化生物学分析软件也产生许多其他格式的文件,例如BEAST、MrBayes,PAML以及r8s等。

(1). Newick格式 1((t2:0.04,t1:0.34):0.89,(t5:0.37,(t4:0.03,t3:0.67):0.9):0.59);

Newick格式文件都是以分号(;)作为结尾,内部节点用一对匹配的括号表示,括号间的节点代表后代节点,例如(t2:0.04, t1:0.34)表示t2、t1的父节点。另外,同级节点之间用逗号分隔,tips用它们的名字表示。分支长度(从父节点到子节点)由子节点后面的实数表示,前面是冒号。与内部节点或分支相关联的数据(例如,自展值)可能编码为节点标签,并由冒号前的简单文本/数字表示。

(2). NEXUS格式 123456789101112131415161718192021222324#NEXUS [R-package APE, Wed Nov  9 11:46:32 2016] BEGIN TAXA;     DIMENSIONS NTAX = 5;     TAXLABELS         t5         t4         t1         t2         t3     ; END; BEGIN TREES;     TRANSLATE         1   t5,         2   t4,         3   t1,         4   t2,         5   t3     ;     TREE * UNTITLED = [&R] (1:0.89,((2:0.59,3:0.37):0.34,     (4:0.03,5:0.67):0.9):0.04); END;

NEXUS包含三个区块:TAXA(物种类群信息)、DATA(数据矩阵或多序列比对)以及TREE(Newick格式的系统发育树)。

(3). New Hampshire eXtended format 12345(((ADH2:0.1[&&NHX:S=human], ADH1:0.11[&&NHX:S=human]):0.05[&&NHX:S=primates:D=Y:B=100],ADHY:0.1[&&NHX:S=nematode],ADHX:0.12[&&NHX:S=insect]):0.1[&&NHX:S=metazoa:D=N], (ADH4:0.09[&&NHX:S=yeast],ADH3:0.13[&&NHX:S=yeast], ADH2:0.12[&&NHX:S=yeast],ADH1:0.11[&&NHX:S=yeast]):0.1[&&NHX:S=Fungi]) [&&NHX:D=N]; (4). 其他软件的输出格式 BEAST 12345678910111213141516171819tree TREE1 = [&R] (((11[&length=9.4]:9.38,14[&length=6.4]:6.385096430786298) [&length=25.7]:25.43,4[&length=9.1]:8.821663252749829) [&length=3.0]:3.10,(12[&length=0.6]:0.56, (10[&length=1.6]:1.56,(7[&length=5.2]:5.19, ((((2[&length=3.3]:3.26,(1[&length=1.3]:1.32, (6[&length=0.8]:0.83,13[&length=0.8]:0.8311577761397366) [&length=2.4]:2.48917886025146) [&length=0.9]:0.9416178372674331) [&length=0.4]:0.49,9[&length=1.7]:1.757288031101215) [&length=2.4]:2.35,8[&length=2.1]:2.1125745387283246) [&length=0.2]:0.23,(3[&length=3.3]:3.31, (15[&length=5.2]:5.27,5[&length=3.2]:3.2710481368304585) [&length=1.0]:1.0409443024626412) [&length=1.9]:2.0372962536780435) [&length=2.8]:2.8446835614595685) [&length=5.3]:5.367459711197171) [&length=2.0]:2.0037467863383043) [&length=4.3]:4.360909907798238)[&length=0.0];

BEAST的输出文件将会包含多种进化推断结果,例如分子钟分析通常会有rate,length,height,posterior,HPD以及不确定范围估计。rate代表某一枝系的进化速率,length代表枝长,height代表从节点到根的时间,而posterior代表贝叶斯Clade可信度值。

MrBayes 12345678910111213141516tree con_all_compat = [&U] (8[&prob=1.0]:2.94e-1[&length_mean=2.9e-1],10[&prob=1.0]:2.25e-1[&length_mean=2.2e-1], ((((1[&prob=1.0]:1.43e-1[&length_mean=1.4e-1],2[&prob=1.0]:1.92e-1[&length_mean=1.9e-1]) [&prob=1.0]:1.24e-1[&length_mean=1.2e-1],9[&prob=1.0]:2.27e-1[&length_mean=2.2e-1]) [&prob=1.0]:1.72e-1[&length_mean=1.7e-1],12[&prob=1.0]:5.11e-1[&length_mean=5.1e-1]) [&prob=1.0]:1.76e-1[&length_mean=1.7e-1], (((3[&prob=1.0]:5.46e-2[&length_mean=5.4e-2], (6[&prob=1.0]:1.03e-2[&length_mean=1.0e-2],7[&prob=1.0]:7.13e-3[&length_mean=7.2e-3]) [&prob=1.0]:6.93e-2[&length_mean=6.9e-2]) [&prob=1.0]:6.03e-2[&length_mean=6.0e-2], (4[&prob=1.0]:6.27e-2[&length_mean=6.2e-2],5[&prob=1.0]:6.31e-2[&length_mean=6.3e-2]) [&prob=1.0]:6.07e-2[&length_mean=6.0e-2]) [&prob=1.0]:1.80e-1[&length_mean=1.8e-1],11[&prob=1.0]:2.37e-1[&length_mean=2.3e-1]) [&prob=1.0]:4.05e-1[&length_mean=4.0e-1]) [&prob=1.0]:1.16e+000[&length_mean=1.162699558201079e+000]) [&prob=1.0][&length_mean=0];

一般而言,MrBayes大部分数据均会被去除,仅保留prob枝系后验概率与length_mean平均枝长。完整的数据还应该包括,prob_stddev,prob_range,prob(percent),prob+-sd,length_median,length_95%_HPD。

PAML 杨子恒教授开发的PAML(Phylogenetic Analysis by Maximum Likelihood)软件包主要用于DNA或蛋白质序列的系统发育分析,其中BaseML与CodeML是两个主要子程序。BasseMl可利用多种碱基取代模型估计树拓扑、分支长度和替代参数,CodeML主要是估计同义与非同义替换率、密码子置换模型下正选择的似然比检验。CodeML输出文件均包含树拓扑结构和同义、非同义替换率的估计的mlc文件。 02. 在R中读取树文件

经常使用的R软件包,主要有ape,phylobase,phytools,而本文主要介绍ggtree。其中,treeio可以直接抓取BEAST,CodeML、,MrBayes,r8s的输出结果文件。

Table 1.1: Parser functions defined in treeio

Parser function Description read.astral parsing output of ASTRAL read.beast parsing output of BEAST read.codeml parsing output of CodeML (rst and mlc files) read.codeml_mlc parsing mlc file (output of CodeML) read.fasta parsing FASTA format sequence file read.hyphy parsing output of HYPHY read.hyphy.seq parsing ancestral sequences from HYPHY output read.iqtree parsing IQ-Tree newick string, with ability to parse SH-aLRT and UFBoot support values read.jplace parsing jplace file including output of EPA and pplacer read.jtree parsing jtree format read.mega parsing MEGA Nexus output read.mega_tabular parsing MEGA tabular output read.mrbayes parsing output of MrBayes read.newick parsing newick string, with ability to parse node label as support values read.nhx parsing NHX file including output of PHYLDOG and RevBayes read.paml_rst parsing rst file (output of BaseML or CodeML) read.phylip parsing phylip file (phylip alignment + newick string) read.phylip.seq parsing multiple sequence alignment from phylip file read.phylip.tree parsing newick string from phylip file read.r8s parsing output of r8s read.raxml parsing output of RAxML 123456789101112131415161718192021library(ggtree) file


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3