ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences

您所在的位置:网站首页 arwen是什么意思 ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences

ARWEN: a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences

2024-07-17 16:44| 来源: 网络整理| 查看: 265

Abstract

Motivation: Mitochondrial genomes encode their own transfer RNAs (tRNAs). These are often degenerate in sequence and structure compared to tRNAs in their bacterial ancestors. This is one of the reasons why current tRNA gene predictor programs perform poorly identifying mitochondrial tRNA genes. As a consequence there is a need for a new program with the specific aim of predicting these tRNAs.

Results: In this study, we present the software ARWEN that identifies tRNA genes in metazoan mitochondrial nucleotide sequences. ARWEN detects close to 100% of previously annotated genes.

Availability: An online version, software for download and test results are available at www.acgt.se/online.html

Contact:  [email protected]

Supplementary information: Supplementary data are available at Bioinformatics online.

1 INTRODUCTION

The mitochondrial transfer RNA (tRNA) genes of many metazoan (including mammalian) species exhibit less conformity to the canonical cloverleaf secondary tRNA structure, and less homology to recognized tRNA consensus motifs, than cytosolic or prokaryotic tRNA genes, to the extent that these mitochondrial genes are referred to as ‘bizarre’ (Helm et al., 2000). Smaller than usual dihydrouridine (D) stems and loops; smaller than usual TφC (T) stems and loops; changes in the number of connector bases between the acceptor (A) stem and D-stem, and D-stem and anticodon (C) stem; and elongated C-stems, have been reported, but perhaps the most astounding feature in some of these tRNA genes is the complete absence of either the whole D-arm, or the whole T-arm and variable (V) loop. These are replaced by short sequences, called the D-replacement loop and TV-replacement loop, respectively. Furthermore, mitochondria often use a genetic code that differs from the universal genetic code.

For these reasons, conventional tRNA detection programs perform poorly. For example, the ARAGORN tRNA detection program (Laslett and Canback, 2004) detects only 3 of the 22 tRNA genes in the Homo sapiens mitochondrion, and tRNAscan-SE (Lowe and Eddy, 1997), using standard settings detects only 1 gene.

The purpose of this study is to develop a heuristic algorithm to search in silico for metazoan tRNA genes. The software should have a detection rate close to 100% even if this results in a number of falsely predicted tRNAs since these can be easily removed when analyzing the genome sequence. The program should be user friendly with a limited number of (user) parameter settings, produce results that are easy to interpret, and a website should be available for the user to perform online analysis. The ARWEN program successfully fulfills all of these requirements.

2 METHODS 2.1 Search algorithm

ARWEN employs a heuristic algorithm that searches for hairpin structures with a 5–6 base-pair (bp) stem and a 6–8 base loop that could be a candidate C-arm. For every candidate C-arm, the upstream sequence is searched for possible D-arm structures (2–5 bp stem and loop of 3–17 bases) and the downstream sequence for possible T-arm structures (2–7 bp stem and loop of 3–31 bases). Both upstream and downstream sequences are assessed for base pairing interactions that could indicate the presence of an A-stem (5–8 bp). ARWEN then attempts to combine these structures into a complete tRNA gene containing at least three out of four of these structures. Unlike ARAGORN, ARWEN does not allow for the presence of introns in the C-loop.

Three different algorithms are used, one for each type of tRNA (D-replacement loop, TV-replacement loop and standard cloverleaf), to assign a score to a candidate sequence. Because a standard cloverleaf tRNA can also form a D- or TV-replacement loop tRNA by opening one hairpin, the initial score for the cloverleaf type is set higher to favor the formation of a full cloverleaf if possible.

The presence of TA in the spacer sequence between A-stem and D-arm, CTAA in the C-arm (where denotes the anticodon sequence), and GTTC homology in the T-arm (when present) are given extra points, respectively. However, the importance of stem base-pairing interactions and tertiary structure interactions is increased compared to the ARAGORN program, and the importance of consensus sequence homology is reduced.

In all stems, GC bonds are considered to be the most thermodynamically stable, and are not penalized. AT bonds, GT bonds and non-bonding base pairs are penalized in order of increasing magnitude. The score for each stem is further modified by stem termination and stem opening base combinations (the last base pairs at either end of the stem and one base beyond), tandem repeats and nearest neighbor interactions within the stem, and lengths of stem and loop for the T-arm that differ from the canonical values (5 bp stem and 7 base loop). Sequences with a high (>50%) or low (



【本文地址】


今日新闻


推荐新闻


    CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3