Learning the RNA World by structural covariations

Many apparently noncoding RNA transcripts are observed for which we don't know the purpose of their existence. Identifying an RNA transcript with a conserved structure is important support for a possible structural RNA function for that transcript. This question is different from structure prediction because any RNA folds into some structure, regardless of whether that structure has any conserved biological function. We have produced a statistical test to assess when an RNA alignment presents evidence of a conserved RNA structure. The program named R-scape (RNA structural covariations above phylogenetic expectation) is computationally lightweight (which is unusual for structural RNA applications). And here is R-scape's webserver.

As more noncoding RNA transcripts are investigated, a picture emerges in which a substantial number of transcripts appear to perform roles where neither the actual RNA sequence nor any RNA structure are relevant, instead depending just on the fact that they are transcribed. Others appear to encode some small previously unidentified peptides. In this context, identifying a (possibly scarce) subset of transcripts with a conserved RNA structure becomes a special pursuit that could unveil new RNA functions in a background of other noncoding RNA transcripts.

This laboratory develops computational probabilistic models to understand RNA structure. We also work with models of sequence evolution in order to bring phylogenetic power to the question of remote homology recognition.

Recent publications

Reach to us

Department of Molecular & Cellular Biology
Northwest Building #430
52 Oxford Street
Harvard University
Cambridge MA 02138, USA

R-scape/CaCoFold

RNA Structural Covariations Above Phylogenetic Expectation/Cascade variation/covariation Constrained Folding algorithm

Current version (now working on Apple Silicon):

R-scape v2.5.1 (Dec 2024).

TORNADO

Design and test different RNA 2D structure architectures. Any modality: SCFGs, CRFs, or thermodynamically-determined parameters.

Current version:

TORNADO v0.6.0 (May 2023).

New From The Lab

RNA3DB: A structurally-dissimilar dataset split for training and benchmarking deep learning models for RNA structure prediction. M. Szikszai, M. Magnus, S. Sanghi, S. Kadyan, N. Bouatta, E. Rivas. BioRxiv, Jan 2024.

RNA3DB clusters RNA 3D chains into distinct groups that are non-redundant both with regard to sequence as well as structure, providing a robust way of dividing training, validation, and testing sets.

Git repository:

rna3db