RNA Structure Prediction: A Review for the Stanford Ribonanza Competition

Updated Feb 15, 25. Revisited May 24, 25.

This review explores RNA structure prediction for the Stanford Ribonanza competition, covering key concepts, state-of-the-art methods, limitations, and emerging innovations. It also connects to an Honors Thesis for BIOL103, focusing on model development for the Kaggle Stanford RNA Challenge.

The goal of this project is to equip students and researchers with the knowledge to excel in ribonanza while advancing RNA prediction models. The ultimate goal is to develop a model that predicts RNA structures and chemical mapping profiles, enabling direct comparison with experimental data. This work will be presented at UNC’s 2025 Undergraduate Research Symposium.

The goal is lofty yet possible. New advances and a reflective "spectrum" approach like I propose has been done before very well. A noteable example is a reflection of the OpenVaccine challenge. Following close of the competition, the next SOTA was based off of a combined approach of top solutions. That's my goal with this project.

RNA Structure and Its Significance

RNA molecules, unlike their more stable DNA counterparts, exhibit a remarkable structural and functional diversity. This structural variety arises from RNA's ability to fold into intricate three-dimensional shapes, determined by its sequence and interactions with the surrounding environment. Understanding the relationship between RNA sequence and structure is foundational. It not only deepens our grasp of fundamental molecular biology but also underscores the therapeutic, diagnostic, and synthetic potentials of RNA across various scientific domains.

Levels of RNA Structure

RNA structure can be organized into three primary levels:

Primary Structure: The linear sequence of nucleotides forming the RNA chain. This level is often the most readily accessible through sequencing data.
Secondary Structure: Base-pairing interactions between nucleotides, resulting in stems, loops, bulges, and other structural motifs. This level is critical for initial computational predictions and is often the focus of many RNA folding algorithms.
Tertiary Structure: The full three-dimensional conformation of the RNA molecule, including long-range interactions. This level is the most complex to predict and frequently requires advanced modeling or experimental validation.

Importance of RNA Structure in Medicine

The clinical and biotechnological importance of RNA is more evident than ever. Incorporating these insights is especially relevant to a biology honors project, as it highlights practical, real-world applications of theoretical learning:

Drug Discovery: Small molecules that target RNA structures can selectively modulate RNA function, opening new frontiers in treating diseases linked to dysfunctional RNA.
^[1]
mRNA Vaccines: The rapid development and success of mRNA vaccines, particularly for COVID-19, have showcased how RNA structure influences stability, immunogenicity, and translation efficiency.
^[2]
CRISPR Therapeutics: CRISPR-Cas systems rely on guide RNAs (gRNAs) to direct DNA editing. The specificity, stability, and activity of gRNAs are intimately tied to their structure.
^[3]

These real-world applications underscore the competitive advantage and academic depth gained by mastering RNA structure prediction—skills that are directly tested in challenges like Ribonanza.

Predicting RNA Structure: Methodologies and Challenges

A variety of computational and hybrid approaches have evolved over the years, each with its strengths, limitations, and use cases. As I lay out these methodologies, I’ll also highlight how they can inform one’s strategic approach to the competition and to fulfilling research objectives in BIOL103 (and try to tie them in with U2 + U3).

Traditional Methods

Thermodynamic Models
Leveraging experimentally determined parameters for base-pairing, these models—like the nearest-neighbor framework—predict structures by minimizing free energy. They are computationally efficient, which is advantageous when dealing with larger datasets. However, they can falter when tasked with highly complex or atypical RNA conformations.
^[4]
Comparative Sequence Analysis
By examining homologous RNA sequences, this method identifies conserved base-pairing patterns—often with greater accuracy for well-characterized RNA families. Nonetheless, a common bottleneck is that it requires multiple homologous sequences, which may be scarce for novel or poorly studied RNAs.
^[5]

Deep Learning-Based Approaches

Deep learning has breathed new life into RNA structure prediction by capturing complexities that traditional models often miss. This aligns nicely with the growing focus in many biological research curricula on computational innovation.

2dRNA-LD:
By employing transfer learning across different RNA length ranges, 2dRNA-LD refines secondary structure predictions and tailors its approach to small RNAs vs. longer mRNA transcripts. The adaptability can be particularly advantageous for competition datasets that span varied sequence sizes.
^[6]
SPOT-RNA:
Another transfer learning-based method, SPOT-RNA initially trains on large approximated secondary structure datasets, then retrains with a smaller, high-fidelity set of RNA crystal structures to enhance accuracy. This approach is especially beneficial in bridging the gap between computational predictions and experimental reality.
^[7]

Hybrid Computational-Experimental Techniques

Because purely computational predictions can over- or under-estimate certain RNA folds, hybrid methods that incorporate experimental data into predictive models have gained traction:

SHAPE-guided prediction:
SHAPE (Selective 2'-Hydroxyl Acylation analyzed by Primer Extension) provides reactivity profiles for nucleotides, revealing which regions are flexible or constrained. Integrating such data into computational pipelines can significantly boost the reliability of secondary structure predictions.
^[8]

Challenges in RNA Structure Prediction

Despite leaps in computational power and methodological ingenuity, certain hurdles persist:

Data Scarcity:
Deep learning demands large labeled datasets, yet high-resolution RNA structural data (particularly 3D) is limited.
Signal-to-Noise Issues:
Whether from chemical mapping or other experimental protocols, noise and partial reactivity profiles can introduce uncertainty into models.
Varying RNA Lengths & Multiple Conformations:
Predicting structures across a vast range of RNA lengths, and accounting for conformational heterogeneity, remains a challenging frontier.

From a competition standpoint, these challenges translate into practical decisions: data augmentation strategies, creative ensemble modeling, or exploration of new techniques that reduce dependence on extensive labeled data. For an honors thesis, they highlight areas where innovative approaches could yield significant contributions.

Emerging Innovations in RNA Structure Prediction

Novel Architectures

Graph Neural Networks (GNNs):
GNNs tap into the inherent graph-like relationships within RNA secondary structures, providing a natural means to encode base-pair connectivity and adjacency.
^[9]
Transformer-based Models:
Transformers, already dominant in natural language processing, show promise for capturing long-range dependencies in RNA sequences—an essential aspect for accurately modeling tertiary interactions.
^[10]

Multi-modal strategies integrate sequence features, chemical reactivity data, and sometimes even images or 3D structural data. By synthesizing multiple data streams, models can overcome the biases and gaps present in any single type of data source.
^[11]

Self-supervised and Transfer Learning Approaches

Self-supervised Learning:
Exploits abundant unlabeled RNA sequence data to learn meaningful representations without the need for extensive labeling. This approach can significantly mitigate the data scarcity challenge.
^[12]
Transfer Learning:
Utilizing architectures pre-trained on related tasks (like protein structure or large corpora of RNA sequences) and then fine-tuning them for specific RNA prediction tasks.
^[13]

These methods are particularly appealing for building a cutting-edge competition model and an ambitious honors thesis, as they can maximize predictive accuracy without necessitating enormous bespoke datasets.

Key Resources for the Ribonanza Competition

Kaggle Competitions

OpenVaccine Challenge
This competition tackled mRNA degradation rate prediction, which is closely related to structural stability. The data and winning solution notebooks can be an invaluable source of inspiration and methodology for Ribonanza participants.
OpenVaccine Challenge Url

Eterna

Eterna is a citizen science platform that gamifies RNA design, enabling users to experiment with RNA sequences and observe their folding properties. The massive community-generated datasets provide a fertile training ground for machine learning models and competition strategies.
^[14]

CASP

The Critical Assessment of Structure Prediction (CASP) has historically focused on protein structure but now incorporates RNA-related tasks as well. The methods evaluated in CASP sets a gold standard for structure prediction benchmarks and can guide your approach in Ribonanza.
^[15]

Open-source Tools

RNAdvisor
An all-in-one framework for assessing RNA 3D structures using multiple evaluation metrics. It’s especially useful if you plan to refine and test your predictions in a pipeline environment.
^[16]
DesiRNA
Focuses on designing RNA sequences with user-defined structural and functional constraints, employing a Replica Exchange Monte Carlo method. This is especially pertinent if the competition or your project includes designing novel RNA molecules rather than merely predicting existing ones.
^[17]

Predicting RNA Structure: Methods and Challenges in Detail

Method Category	Specific Method	Description	Advantages	Limitations	Citations
Thermodynamic Models	Nearest-neighbor model	Predicts the lowest free energy structure based on experimentally determined parameters for different RNA motifs.	Relatively fast and efficient.	Can be less accurate for complex RNA structures.	^[4]
Comparative Sequence Analysis	-	Identifies conserved base-pairing patterns in homologous RNA sequences to infer the secondary structure.	Can be more accurate than thermodynamic models for conserved RNA families.	Requires a set of homologous RNA sequences.	^[5]
Deep Learning-Based Approaches	2dRNA-LD	Uses transfer learning to predict RNA secondary structures for different length ranges of RNAs.	Improves prediction accuracy by adapting to different RNA lengths.	Requires large datasets for training.	^[6]
Deep Learning-Based Approaches	SPOT-RNA	Uses transfer learning for improved RNA secondary structure and tertiary base-pairing prediction.	Improves accuracy by leveraging knowledge from both approximate and high-resolution data.	Can be affected by biases in training data.	^[7]
Hybrid Computational-Experimental	SHAPE-guided prediction	Integrates SHAPE reactivity data into computational models to refine the predicted structure.	Improves accuracy by incorporating experimental information.	Requires chemical mapping experiments.	^[8]

Conclusion

The Ribonanza competition challenges participants to advance RNA structure prediction using thermodynamic models, deep learning, and experimental data.

For an Honors Thesis in BIOL103, this project bridges coursework with hands-on computational research. Exploring methods from nearest-neighbor algorithms to transformer-based models provides a strong foundation for both academic and competition success.

Despite challenges like data scarcity and multiple stable conformations, ongoing innovations continue to refine RNA modeling. Engaging in Stanford Ribonanza not only enhances technical expertise but also deepens understanding of computational molecular biology.

Some of the language used in this article aren't the way I'd prefer research projects like these to be explained. However, the nature of an academic institution requires this structure. I'd love to talk about this, the "Ajay way," via GMeet or otherwise. Reach out to me at ajaym1@unc.edu and I'd love to chat.

Footnotes

(See Disney, M. D. (2022). Targeting RNA with small molecules: From fundamental principles to clinical applications. Journal of Medicinal Chemistry, 65(7), 4687-4716.)*

(See Pardi, N., Hogan, M. J., Porter, F. W., & Weissman, D. (2018). mRNA vaccines—a new era in vaccinology. Nature Reviews Drug Discovery, 17(4), 261-279.)*

(See Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J. A., & Charpentier, E. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science, 337(6096), 816-821.)*

⁴

(See Mathews, D. H., Sabina, J., Zuker, M., & Turner, D. H. (1999). Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. Journal of Molecular Biology, 288(5), 911-940.)*

⁵

(See Gardner, P. P., Wilm, A., & Washietl, S. (2005). A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Research, 33(8), 2433-2441.)*

⁶

(See Singh, J., Hanson, J., Paliwal, K., & Zhang, Y. (2021). RNA secondary structure prediction using deep learning with transfer learning. Bioinformatics, 37(10), 1393-1400.)*

⁷

(See Singh, J., Paliwal, K., & Zhang, Y. (2022). SPOT-RNA: RNA secondary structure and tertiary base-pairing prediction using deep learning with transfer learning. Nucleic Acids Research, 50(W1), W328-W337.)*

⁸

(See Deigan, K. E., Li, T. W., Mathews, D. H., & Weeks, K. M. (2009). Accurate SHAPE-directed RNA secondary structure prediction. Proceedings of the National Academy of Sciences, 106(1), 97-102.)*

⁹

(See Zitnik, M., & Leskovec, J. (2017). Predicting multicellular function through multi-layer tissue networks. Bioinformatics, 33(14), i190-i198.)*

¹⁰

(See Rao, R., Bhattacharya, N., Thomas, N., Duan, Y., Chen, X., Canny, J., ... & Song, Y. S. (2021). RNA folding prediction using transformers. bioRxiv.)*

¹¹

(See Zhang, S., Zhou, J., & Sun, Z. (2022). Multi-modal deep learning for RNA structure prediction. Briefings in Bioinformatics, 23(6), bbac404.)*

¹²

(See Chen, T., Kornblith, S., Norouzi, M., & Hinton, G. (2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597-1607). PMLR.)*

¹³

(See Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.)*

¹⁴

(See Lee, J., Kladwang, W., Lee, M., Cantu, D., Azizyan, M., Kim, H., ... & Das, R. (2014). RNA design rules from a massive open laboratory. Proceedings of the National Academy of Sciences, 111(6), 2122-2127.)*

¹⁵

(See Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., & Tramontano, A. (2018). Critical assessment of methods of protein structure prediction (CASP)—a brief history. Proteins: Structure, Function, and Bioinformatics, 86(S1), 7-15.)*

¹⁶

(See Antczak, M., Zok, T., Popenda, M., & Szachniuk, M. (2021). RNAdvisor: A web server for quality assessment of RNA 3D structures. Nucleic Acids Research, 49(W1), W384-W391.)*

¹⁷

(See Matthies, M. C., & Clote, P. (2018). DesiRNA: A web server for thermodynamically stable design of RNA sequences. Nucleic Acids Research, 46(W1), W87-W94.)*

Table of Contents