【ROSALIND】【练Python,学生信】31 转换与颠换 |
您所在的位置:网站首页 › AAC转换的是转换还是颠换的 › 【ROSALIND】【练Python,学生信】31 转换与颠换 |
如果第一次阅读本系列文档请先移步阅读【ROSALIND】【练Python,学生信】00 写在前面 谢谢配合~ 题目: 转换与颠换(Transitions and Transversions) Given: Two DNA strings s1 and s2 of equal length (at most 1 kbp). 所给:两条不超过1kb长的DNA序列s1和s2。 Return: The transition/transversion ratio R(s1,s2). 需得:转换与颠换频率的比值R(s1,s2)。
测试数据 >Rosalind_0209 GCAACGCACAACGAAAACCCTTAGGGACTGGATTATTTCGTGATCGTTGTAGTTATTGGA AGTACGGGCATCAACCCAGTT >Rosalind_2200 TTATCTGACAAAGAAAGCCGTCAACGGCTGGATAATTTCGCGATCGTGCTGGTTACTGGC GGTACGAGTGTTCCTTTGGGT 测试输出 1.21428571429
生物学背景 点突变包括两种类型:转换(transition)和颠换(transversion)。转换是嘌呤与嘌呤,或嘧啶与嘧啶之间的替换,即A与G,T与C之间的替换;颠换则是嘌呤与嘧啶之间的替换。简单来说,转换不改变碱基的种类,颠换会改变。如下图: 因为颠换的改变更为剧烈,所以发生的频率更低。在基因组中,转换与颠换频率的比值约为2。在蛋白编码区,这个比值可以超过3,因为相对于颠换,转换不容易改变密码子编码的氨基酸。也因为这个原因,转换与颠换频率的比值可以帮我们鉴定蛋白编码区。
数学背景 将序列发生转换与颠换的次数相比则得到所求比值。
思路 本题思路很简单,只需比较两序列,记录转换和颠换的次数,相比即可。
代码 def readfasta(lines): """读入fasta格式文件的函数""" seq = [] index = [] seqplast = "" numlines = 0 for i in lines: if '>' in i: index.append(i.replace("\n", "").replace(">", "")) seq.append(seqplast.replace("\n", "")) seqplast = "" numlines += 1 else: seqplast = seqplast + i.replace("\n", "") numlines += 1 if numlines == len(lines): seq.append(seqplast.replace("\n", "")) seq = seq[1:] return index, seq
f = open('rosalind_tran.txt', 'r') lines = f.readlines() f.close() [index, seq] = readfasta(lines) s1 = seq[0] s2 = seq[1] i = 0 ti = 0 # 记录转换 tv = 0 # 记录颠换 while i < len(s1): if (s1[i] == 'A' and s2[i] == 'G') or (s1[i] == 'G' and s2[i] == 'A') or (s1[i] == 'C' and s2[i] == 'T') or (s1[i] == 'T' and s2[i] == 'C'): ti = ti + 1 elif ((s1[i] == 'A' and s2[i] == 'T') or (s1[i] == 'A' and s2[i] == 'C') or (s1[i] == 'G' and s2[i] == 'T') or (s1[i] == 'G' and s2[i] == 'C') or (s1[i] == 'C' and s2[i] == 'G') or (s1[i] == 'C' and s2[i] == 'A') or (s1[i] == 'T' and s2[i] == 'G') or (s1[i] == 'T' and s2[i] == 'A')): tv = tv + 1 i += 1 per = ti / tv print(round(per, 11))
|
今日新闻 |
推荐新闻 |
CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3 |