MCB128: AI in Molecular Biology (Spring 2026)
(Under construction)
- Description
- Aims and objectives
- Prerequisites and background
- Course format
- Assignments and grading
- What to expect from Elena as an instructor?
- Policies
- Accommodations for students with disabilities
Lectures: Mon/Wed/Fri 10:30-11:45
Starting: Monday 26 January 2026
Location: TDB
teaching team | student hours | location |
Dr Elena Rivas | Thurs 1-3pm | Northwest #430 |
TF TBD | TBD | TDB |
now looking for TFs please contact Elena
schedule
block | week | Lecture | Slides | Section | Homework | Answers |
b0: Single neuron | DNA functional classification | |||||
b1: Feed forward networks | Perceptrons / Protein 2D structure | |||||
Backpropagation | ||||||
b2: CNNs,RNNs | CNNs / DNA sequence motifs | |||||
RNNs | ||||||
b3: Transformers | Self-attention, Transformers for alignments | |||||
Protein Structure, AlphaFold | ||||||
b4: Large language models | LLMs | |||||
DNA, RNA, protein LMMs | ||||||
b5: AutoEncoders | Autoencoders / scRNA-seq | |||||
Variational AutoEncoders / CryoEM | ||||||
b6: Generative NNs | diffusion / protein design | |||||
Graph NNs / antibiotics |
Description
What are convolutional neural networks (CNNs) and how are they used to predict sequence motifs in biological sequences? What is a transformer and how it is used by AlphaFold to predict protein structures?
AI/deep-learning methods are now consistently used to approach many computational questions in molecular biology such as: motif finding, homology of DNA/RNA and proteins, structure prediction for DNA/RNA and proteins, amongst others. The objective of this course is to introduce AI concepts and methods in the context of important questions in computational biology. A given question (i.e. protein folding) will be paired to a AI method (transformers), and an in-depth description of both will be provided. The goal is both to describe the fundamental algorithms as well as some important AI implementation.
This course will explore the major advances in deep learning, with a special emphasis on their applications to molecular biology and genomics. Starting from a single neuron (perceptron), we will progress to more complex architectures such as convolutional and recurrent neural networks, transformers, and generative neural networks. The course will cover both the general principles of these methods as well as specific applications in genomics. This is a computationally rigorous course for students interested in computational biology.
Aims and objectives
Students taking this course will come out with a mathematical knowledge of the most successful and frequently used deep learning methods, and an understanding of their uses and best applications for important computational questions in molecular biology. The students will also acquire knowledge of essential methods in Python and PyTorch used in deep learning. After taking MCB128, students will be able to design and implement their own deep learning methods for new computational questions of interest.
Prerequisites and background
This course is primarily designed for undergraduates (mainly juniors and seniors) as well as early graduate students with interdisciplinary interests in computation, molecular biology and mathematics. Some basic knowledge of computing (python AM10 level) is strongly encouraged, as well as some basic knowledge of molecular biology (LS1 level) and statistics (STAT110) and algebra (MA21). Having expertise in at least two of the three areas is recommended. We will build from scratch our deep-learning toolkit, mainly using Python and PyTorch. Some introductory course to Python (such as the FAS informatics workshop provided in the Fall 2025 https://www.rc.fas.harvard.edu/upcoming-training/) would be extremely useful.
Course format
The course is divided in 6 blocks (two weeks each), starting with one fundational 0-block (one week) dedicated to the Single Artificial Neuron. Blocks are structured more or less in chronologically: from perceptrons (feed-forward neural networks), to convolutional and recurrent neural networks, transformers and beyond. Each block will describe one or more related deep-learning methods, a biological question for which they been applied, a state of the art existing approach, and a specific deep-learning implementation for that biological question.
Each block expands two weeks. Each week includes two 75 minutes lectures, plus one discussion section lead by the TFs. The lectures will be given by Elena Rivas, and there will have extensive notes with bibliography, and slides of the actual lectures for students to review at any time.
Sections will be devoted to specifics of coding the methods, and to describe in detail specific deep-learning concepts and jargon standard in the field, such as, tensor, backpropagation, masking, embedding, broadcasting, distillation, regularization, loss functions, and many others. Specific homework coding questions will also be discussed in sections. Questions brought up by the students that would help them with homework would also be addressed in the weekly discussion sections.
Assignments and grading
The course is divided in 6 blocks. There will be one homework and one (very short) in-class quizz per block. The final grade will be based on all six homework (80%) and quizzes (10%) plus participation (10%). Participation includes: attendance to class, participation in Discussion sections, Student Hours or other forums such as slack.
What to expect from Elena as an instructor?
I am a computational biologist specialized in genomes and RNA structure analysis, which means that I like biological sequences and all the secrets that they hide, for which I like to design and implements new algorithms. Many years ago I was a theoretical physicist, which means that I also like math-although mathematicians make fun of the ``pragmatic’’ ways physicist use math.
I like to understand things from first principles (much easier than memorizing them), and I bring that to the classroom. I will go into mathematical details to a certain extent, although depending on your own background and interests, that may not be completely necessary to be successful doing the homework. I also think that coding can be a lot of fun, and I expect to show you that.
I think that in the very near future any biologist would need to know some (deep-learning) computation, and have a good grasp of statistics. I hope this class can help you to take a step in that direction.
In addition to Student Hours (held by me and the TFs), I am often available for one-on-one interactions at other times.
Policies
Absence
Absences will be reflected in the 10% of the grade reserved for participation in the course.
Late work
Late work will have a penalty of lowering the maximum possible grade by 10% for each day that it is late. If you can anticipate (with at least 1 week) an important reason why your homework will be turned in late, you may request an exception.
Additionally, resubmission of homework (after grading) is allowed. The max grade increase is 20% of current grade.
Academic integrity
You are encouraged to discus your homework with your classmates, but the work you present has to be your own and be written in your own words. If someone helped you with a particular aspect, add a comment in your code explaining who helped you and what is the contribution, and you want to demonstrate that you understand it. This also applies to any materials taken from GAI, and the Internet.
AI policy
As for using ChatGPT and other generative AI (GAI) tools to produce your homework, these are the policies:
-
You can look at AI generated materials at any time during the completion of your work, but you cannot present AI generated material as your own. Any AI use must be appropriately acknowledged and cited. Any AI code has to be annotated and modified for you particular purpose.
-
AI materials should be considered as a tool, often useful for you to learn a technique or to see code implementing it. But be aware that these contents (unlike those in published books with authors responsible for them), are not produced with the expectation that they are correct or accurate.
-
Ultimately, you are responsible for all the work that you present as yours, and it has to be written in your own words. A straight AI generated log cannot be presented as homework. Violations of this policy will be considered academic misconduct.
Accommodations for students with disabilities
Students needing accommodations because of a disability should present their Faculty Letter from the Accessible Education Office (AEO) and speak with Elena by the end of the second week of the term for us to be able to respond in a timely manner.