MCB128: AI in Molecular Biology (Spring 2026)

(Under construction)


Lectures: Mon/Wed/Fri 10:30-11:45
Starting: Monday 26 January 2026
Location: TDB

teaching team student hours location
Dr Elena Rivas Thurs 1-3pm Northwest #430
TF TBD TBD TDB

now looking for TFs please contact Elena

schedule

block week Lecture Slides Section Homework Answers
b0: Single neuron DNA functional classification          
b1: Feed forward networks Perceptrons / Protein 2D structure          
  Backpropagation          
b2: CNNs,RNNs CNNs / DNA sequence motifs          
  RNNs          
b3: Transformers Self-attention, Transformers for alignments          
  Protein Structure, AlphaFold          
b4: Large language models LLMs          
  DNA, RNA, protein LMMs          
b5: AutoEncoders Autoencoders / scRNA-seq          
  Variational AutoEncoders / CryoEM          
b6: Generative NNs diffusion / protein design          
  Graph NNs / antibiotics          

Description

What are convolutional neural networks (CNNs) and how are they used to predict sequence motifs in biological sequences? What is a transformer and how it is used by AlphaFold to predict protein structures?

AI/deep-learning methods are now consistently used to approach many computational questions in molecular biology such as: motif finding, homology of DNA/RNA and proteins, structure prediction for DNA/RNA and proteins, amongst others. The objective of this course is to introduce AI concepts and methods in the context of important questions in computational biology. A given question (i.e. protein folding) will be paired to a AI method (transformers), and an in-depth description of both will be provided. The goal is both to describe the fundamental algorithms as well as some important AI implementation.

This course will explore the major advances in deep learning, with a special emphasis on their applications to molecular biology and genomics. Starting from a single neuron (perceptron), we will progress to more complex architectures such as convolutional and recurrent neural networks, transformers, and generative neural networks. The course will cover both the general principles of these methods as well as specific applications in genomics. This is a computationally rigorous course for students interested in computational biology.

Aims and objectives

Students taking this course will come out with a mathematical knowledge of the most successful and frequently used deep learning methods, and an understanding of their uses and best applications for important computational questions in molecular biology. The students will also acquire knowledge of essential methods in Python and PyTorch used in deep learning. After taking MCB128, students will be able to design and implement their own deep learning methods for new computational questions of interest.

Prerequisites and background

This course is primarily designed for undergraduates (mainly juniors and seniors) as well as early graduate students with interdisciplinary interests in computation, molecular biology and mathematics. Some basic knowledge of computing (python AM10 level) is strongly encouraged, as well as some basic knowledge of molecular biology (LS1 level) and statistics (STAT110) and algebra (MA21). Having expertise in at least two of the three areas is recommended. We will build from scratch our deep-learning toolkit, mainly using Python and PyTorch. Some introductory course to Python (such as the FAS informatics workshop provided in the Fall 2025 https://www.rc.fas.harvard.edu/upcoming-training/) would be extremely useful.

Course format

The course is divided in 6 blocks (two weeks each), starting with one fundational 0-block (one week) dedicated to the Single Artificial Neuron. Blocks are structured more or less in chronologically: from perceptrons (feed-forward neural networks), to convolutional and recurrent neural networks, transformers and beyond. Each block will describe one or more related deep-learning methods, a biological question for which they been applied, a state of the art existing approach, and a specific deep-learning implementation for that biological question.

Each block expands two weeks. Each week includes two 75 minutes lectures, plus one discussion section lead by the TFs. The lectures will be given by Elena Rivas, and there will have extensive notes with bibliography, and slides of the actual lectures for students to review at any time.

Sections will be devoted to specifics of coding the methods, and to describe in detail specific deep-learning concepts and jargon standard in the field, such as, tensor, backpropagation, masking, embedding, broadcasting, distillation, regularization, loss functions, and many others. Specific homework coding questions will also be discussed in sections. Questions brought up by the students that would help them with homework would also be addressed in the weekly discussion sections.

Assignments and grading

The course is divided in 6 blocks. There will be one homework and one (very short) in-class quizz per block. The final grade will be based on all six homework (80%) and quizzes (10%) plus participation (10%). Participation includes: attendance to class, participation in Discussion sections, Student Hours or other forums such as slack.

What to expect from Elena as an instructor?

I am a computational biologist specialized in genomes and RNA structure analysis, which means that I like biological sequences and all the secrets that they hide, for which I like to design and implements new algorithms. Many years ago I was a theoretical physicist, which means that I also like math-although mathematicians make fun of the ``pragmatic’’ ways physicist use math.

I like to understand things from first principles (much easier than memorizing them), and I bring that to the classroom. I will go into mathematical details to a certain extent, although depending on your own background and interests, that may not be completely necessary to be successful doing the homework. I also think that coding can be a lot of fun, and I expect to show you that.

I think that in the very near future any biologist would need to know some (deep-learning) computation, and have a good grasp of statistics. I hope this class can help you to take a step in that direction.

In addition to Student Hours (held by me and the TFs), I am often available for one-on-one interactions at other times.

Policies

Absence

Absences will be reflected in the 10% of the grade reserved for participation in the course.

Late work

Late work will have a penalty of lowering the maximum possible grade by 10% for each day that it is late. If you can anticipate (with at least 1 week) an important reason why your homework will be turned in late, you may request an exception.

Additionally, resubmission of homework (after grading) is allowed. The max grade increase is 20% of current grade.

Academic integrity

You are encouraged to discus your homework with your classmates, but the work you present has to be your own and be written in your own words. If someone helped you with a particular aspect, add a comment in your code explaining who helped you and what is the contribution, and you want to demonstrate that you understand it. This also applies to any materials taken from GAI, and the Internet.

AI policy

As for using ChatGPT and other generative AI (GAI) tools to produce your homework, these are the policies:

Accommodations for students with disabilities

Students needing accommodations because of a disability should present their Faculty Letter from the Accessible Education Office (AEO) and speak with Elena by the end of the second week of the term for us to be able to respond in a timely manner.