Learning the RNA world by structural covariation analysis

R-scape builds the case for structural covariations in a RNA alignment by sampling simulated negative control alignments in which the possible structural correlation of the substitutions is broken. R-scape analysis refuted several proposed structures for some long noncoding RNAs (lncRNAs) that had been presented as evolutionarily conserved, such as HOTAIR, SRA RNA and Xist-RepA. In addition to shedding light on the issue of which lncRNAs have conserved structures, R-scape is also useful to improve the structural annotations of well-known structural RNAs. R-scape has already been used extensively by the database Rfam to improve the annotation of their RNA families~\citep{Kalvari18}. R-scape opens new lines of research for my lab.

Alignment power to inform R-scape's significant covariations

If R-scape does not see evidence for a conserved structure, that negative results does not necessarily mean a conserved structure is not present. The alignment could have too few sequences or too little divergence to be able to present enough covariation. To directly address the question: I haven't found any structural covariations in this alignment, but should have I found any? I am working on incorporating into R-scape information about the alignment's statistical power that would allow us to distinguish a situation in which the alignment is inconclusive (not enough variability) versus conclusive (if they were structural covariations, you should have found them). For example, preliminary results on the lncRNA HOTAIR indicates that while the alignment has sufficient statistical power, no significant covariations are observed for the proposed HOTAIR secondary structure.

Structural RNA genefinding

R-scape's methodology can be readily applied to develop a new structural RNA genefinder. By combining R-scape's high and controllable specificity (R-scape reports significant basepairs for a desired expected number of false positives), with its fast performance on large RNA alignments, we expect to surpass the handicaps of current structural RNA genefinder. Working with William Gao a Harvard undergraduate, we have started a pilot project on fungi genomes. A longer-term goal is to deploy this method on the human genome.

Probabilistic models for RNA structure prediction

Current methods for RNA structure prediction (context-free grammars) typically rely on a nested RNA secondary structure of canonical interactions (A-U, G-C and G-U). But in addition to pseudoknots, which are non-nested canonical interactions, structural RNAs include tertiary interactions that use other types of H-bonded interactions which are, in general, also non-nested. In a typical RNA structure, the majority of residues are involved in some form of interactions, currently not taken into account by most prediction methods.

Bayesian networks (BN) are probabilistic models that can be used to generalize the pairwise dependencies of RNA secondary structure to any possible interactions (non nested and more than pairwise). Starting with a covariation-grounded secondary structure, I will build BNs to model different features such as pseudoknot helices and non canonical interactions. The hypothesis is that by modeling all possible interactions prediction accuracy will increase, and with that all other structural RNA detection and recognition methods will improve.

Bringing the power phylogeny to homology recognition

A outstanding issue is to integrate phylogenetic methods into homology searches. Current homology detection methods (such as HMMER models) by assigning fixed values to all the parameters, implicitely assume a fixed evolutionary distance for all searches. In colaboration with Sean Eddy, plan to convert HMMER's fixed parameterizations into time-dependent parameterized functions, such that the evolutionary distance (time) would be optimized in a search-dependent manner. The hypothesis is that such phylogenetic parameterization would result into more sensitive homology recognition and more accurate sequence alignments.