Prediction of RNA

您所在的位置:网站首页 rnasequence Prediction of RNA

Prediction of RNA

2023-06-04 14:59| 来源: 网络整理| 查看: 265

In this study, we evaluate iDeepS on large-scale RBP binding sites derived from CLIP-seq [30]. Figure 1 shows the flowchart of iDeepS for predicting RBP binding sites. The details of the network architecture are shown in Additional file 1: Figure S1. We evaluate the performance of iDeepS for predicting binding sites on RNAs and compare it with the state-of-the-art methods. Furthermore, we identify the binding sequence and structure motifs using CNNs integrated in iDeepS.

Fig. 1

The flowchart of proposed iDeepS. For each experiment, iDeepS integrates two CNNs (one is for sequences, the other is for structures predicted by RNAshape from sequences) to predict RBP interaction sites and identify binding sequence and structure motifs, followed by the bidirectional LSTM, which learns the long range dependencies between learned sequence and structure motifs. Finally, the outputs from bidirectional LSTM are fed into a sigmoid classifier to predict the probability of being RBP binding sites

Full size image Performance of iDeepS

The performance of iDeepS is compared with both sequence-based and structure-based methods as described below.

First, we compare it with the sequence-based DeepBind and Oli across the 31 experiments. iDeepS results in an average AUC of 0.86, which is a little better than 0.85 of DeepBind, and similar to AUC 0.86 of DeeperBind. The performance of Oli [31] is much lower than iDeepS, with an average AUC of 0.77 across the 31 experiments. For some proteins, Oli’s performance is close to random guessing, e.g. protein Ago2-MNase with AUC 0.512. As showed in Table 2, iDeepS outperforms DeepBind on 25 of 31 experiments, DeeperBind on 19 experiments, and Oli on all experiments. It is interesting to note that the three methods have large performance differences across individual experiments. For iDeepS, the AUCs ranges from 0.59 for protein Ago2-MNASE to 0.98 for protein HNRNPC. For Ago2 protein, iDeepS cannot yield high performance. The reason is that Ago2 binding specificity is primarily mediated by miRNAs [32], the expressed miRNAs have a high influence on Ago2-RNA interactions, which results in more variable binding motifs than RBPs that bind to RNAs directly. In addition, we compare iDeepS with DBN-based DBN-kmer that uses kmer features and a DBN to predict RBP binding sites. DBN-kmer yields the mean AUC of 0.77 (Additional file 2: Figure S2), which is much worse than CNN-based DeepBind and iDeepS.

Table 2 The AUC performance comparison between iDeepS and other methods on 31 experimentsFull size table

Second, we compare iDeepS with structure-profile-based GraphProt, which demonstrates better performance than RNAcontext [7]. Across the 31 experiments, GraphProt yields the average AUC of 0.82, which is worse than 0.86 of iDeepS. As shown in Fig. 2, iDeepS achieves better AUCs than GraphProt on 30 of the 31 experiments. Our method improves the AUCs for some proteins by a large margin. For example, iDeepS yields an AUC 0.77 for protein Ago/EIF, which is an increase of 12% compared to AUC 0.69 of GraphProt (Table 2).

Fig. 2

The AUCs of iDeepS, DeepBind, Oli and GraphProt across 31 experiments. The performances are evaluated on the same training and independent testing set across 31 experiments (x-axis) for iDeepS,DeepBind, DeeperBind, Oli and GraphProt. For Oli, DeepBind and DeeperBind, only sequences are used. For iDeepS and GraphProt, sequences and predicted structures are used

Full size image

In addition, iDeepS outperforms iONMF (reported average AUC of 0.85 on the same data) using multiple sources of data, including kmer frequency, secondary structure, GO Information and gene type [14]. They also report that the iONMF surpasses the GraphProt and RNAcontext. However, iDeepS performs a little worse than our other deep learning based method iDeep, which integrates multiple sources of data, including gene type and clip-cobinding, instead of only sequences. It is expected that the fully sequence-based method iDeepS will have a more general application scope in the real-world applications.

In summary, iDeepS not only on average achieves better performance than other peer sequence-based methods, it also outperforms some approaches integrating multiple sources of hand-designed features. Our results demonstrate that iDeepS benefits strongly from learning the combination of sequence and structure features for predicting RBP binding sites.

Insights in sequence-structure motifs

A big advantage of iDeepS is that it also provides biological insights, e.g. learned binding motifs, of the RBPs. As compared to GraphProt, which requires a complicated postprocessing step, iDeepS easily converts learned parameters of the convolved filters to PWMs and allows for identification of the sequence and structure motifs.

In this study, we infer the binding motifs across 31 experiments. Of these, 19 experiments have known sequence motifs in the CISBP-RNA database or the literature. As shown in Fig. 3, iDeepS is able to discover experimentally verified sequence motifs for these 19 experiments, of which 15 are matched against CISBP-RNA with significant E-value cutoff 0.05 provided by TOMTOM [33]. The motifs of the remaining 4 proteins resemble the motifs reported by other studies based on visual inspection. iDeepS discovers repeated UG dinucleotides motifs for TDP-43, which contains these dinucleotide repeats in 80% of the 3’UTR region by microarray analysis [13, 34]. iDeepS captures a known motif, which is a crucial regulator in germline development [35], for QKI with significant E-value 0.00008. The motif for PUM2 has been found with an AU-rich sequence motif by iDeepS, which is close to the motifs identified based on top sequence read clusters [7]. The results show that the sequence motifs identified by iDeepS are consistent with verified motifs.

Fig. 3

iDeepS captures known sequence motifs and structure motifs. The predicted sequence motifs are compared them against known motifs in study [48] from CISBP-RNA database and literature. E-value is the expected number of false positives for the predicted motifs against known motifs using TOMTOM. The Adjusted p-value is estimated for the corresponding structure motif using enrichment analysis tool AME in MEME Suite. The structure motifs are labelled as follows: stems (S), multiloops (M), hairpins (H), internal loops (I), dangling end (T) and dangling start (F). Note that these listed logos do not represent the full extent of the matched motifs

Full size image

The iDeepS method allows for discovery of structure motifs. iDeepS has demonstrated that RBPs have preferences to generally structured regions. As shown in Fig. 3, the proteins in the ELAVL protein family prefer binding to stem structures, which is consistent with the in vivo and in vitro binding data [36]. iDeepS also discovers that the protein hnRNPC prefers to bind to U-rich hairpin structures, the protein PUM2 binds to stem regions which are UA-rich and the protein QKI interacts with the multiloops region, which all agree with the finding in [13]. Of the 19 structure motifs listed in Fig. 3 that are similar to detected structure motifs by GraphProt, 15 are significantly enriched with adjusted p-value



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3