Ding Lab
Bridge the biomedical data and discovery!

Big Biomedical Data analysis

Course overview

"Biomedical Big Data" is a comprehensive summer course designed to foster expertise across disciplines in the realm of big data applied to biomedicine. This course offers a deep dive into the intricate world of biomedical big data, blending foundational topics in bioinformatics and computational biology with advanced machine learning techniques. Students will engage with high-throughput data applications, exploring how to process, analyze, model, and interpret DNA, RNA, and protein data. Practical application is a cornerstone of the curriculum, with hands-on Jupyter notebook sessions that are pivotal for mastering data-driven methodologies. Additionally, the course incorporates cutting-edge research in bioinformatics, featuring studies on single-cell genomics and innovative computational methods. This makes it an ideal platform for students keen on enhancing their theoretical insights and practical skills in modern biomedical challenges.

Course materials

Cloud Files


EXMD 521/BMDE 521: Computational methods for single-cell genomics data analytics

General information

Course #:EXMD521/BMDE521
Section #:001
Term and year:Fall 2023
Course pre-requisite(s):MATH 203, MATH 223, MATH 323, and one course in programming (or equivalent background with permission of instructor). Additional background in Calculus and Probability is required (e.g., MATH 222 and MATH 204).
Course co-requisite(s):None
Course schedule:Wed. 8:30-11:30
Number of credits:3

Instructor information

Name and title:Dr. Jun Ding, Assistant Professor, Department of Medicine, Associate Member, Department of Biomedical Engineering, McGill University
Email:jun.ding@mcgill.ca
Office hours:1-3PM Friday
Communication plan:The students can find me at my office (Glen site EM3.2212) during office hours for questions. For those students who cannot be in the office in person for various reasons, I will also be available by Zoom at https://mcgill.zoom.us/j/3418431568 during office hours.

TA information

Name/Email:Yumin Zheng (yumin.zheng@mail.mcgill.ca)
Office hours:Friday afternoon 2-4pm

Course overview

With the advances in sequencing biotechnologies, the complexity of biomedical datasets has been ever-increasing in the past decade. Compared to conventional bulk sequencing, the emerging single-cell measurements (particularly the multi-omics ones) can comprehensively profile biological processes in high resolution and large scale. It provides unrivaled opportunities to derive a deeper understanding of the biological mechanisms underlying various bioprocesses, which can revolutionize the development of novel diagnostic and therapeutic regimens. However, single-cell sequencing data is much more high-dimensional, large-scale, sparse, and noisy compared to the bulk sequencing counterparts, which presents new computational challenges in methods for the representation, inference, and learning of single-cell data in support of biomedical research. In the past years, numerous computational methods (especially machine learning approaches) have been developed to address the above challenges, which substantially promotes single-cell applications in a large variety of studies (e.g., finding disease markers and drug candidates). This course will discuss popular computational methods (particularly machine learning approaches) for various single-cell data analysis, modelling, and visualization tasks that could drive novel biomedical discoveries with hands-on real-world examples and exercises.

Instructor message regarding course delivery

  • Besides the in-person classes, the course will also be delivered via the Zoom platform to promote course availability. The detailed Zoom link is to be determined.
  • All the course contents, learning materials, and resources will be available to the students via the myCourses platform.
  • Students could seek support from McGill student service (https://www.mcgill.ca/studentservices/) if they are feeling overwhelmed by their academic work and/or would like to further develop their time and workload management skills.

Learning outcomes

By the end of the course, the students are expected to understand/master the following:

  • What is single-cell genomics data, and why do they help solve complex biomedical problems?
  • How to preprocess and normalize single-cell genomics data (single-cell RNA-seq, single-cell ATAC-seq, single-cell spatial transcriptomics, etc.)
  • How to analyze single-cell genomics data (reducing dimensions, denoising/imputation, clustering cells, annotating cell populations, inferring pseudo time/cellular trajectories, cell-cell interaction inference) with existing methods
  • How to develop novel computational models to address emerging computational challenges (e.g., single-cell spatial data analysis) in single-cell genomics data analytics?
  • How to integrate single-cell multi-omics data to understand cellular dynamics?

Instructional methods

  • Lectures and programming exercises with analysis cases in Python (JupyterLab)
  • Students will need to install Anaconda and JupyterLab (all freely available) to participate in the course
  • All the students should participate in the course (either in person or on Zoom) for the lectures
  • If students anticipate that they cannot participate in certain course components, they should inform the instructor at least 10 hours before the lecture.

Note that instructional methods are subject to change based on public health protocols (https://www.mcgill.ca/maxbellschool/programs/resources-current-students/covid-19-protocols-and-resources).

Expectations for student participation

The students should attend all the classes. While in-person participation is the standard, online participation via Zoom is also accepted in case the students notify the instructor at least 10 hrs before the lecture by email.

Class recordings

All the lectures will be recorded, and the videos will be shared via the myCourses platform. Students should check their emails and myCourses for course updates at least once a week. In addition, students can download the myCourses Pulse mobile app to stay connected and on track.

Intellectual property considerations

I ask for everyone's cooperation in ensuring that the videos and associated materials are not reproduced or placed in the public domain. This means that each of you can use it for your own purposes, but you cannot allow others to use it by posting it online or giving it or selling it to others who may copy it and make it available. Thank you very much for your help with this.

Required course materials

Course material, prepared by the Lecturer, will be available to the registered students via myCourses.

Course content

In this course, we will discuss the computational methods (mostly machine learning models) for common single-cell data analysis tasks: (1) Data preprocessing; (2) Dimensionality reduction; (3) Cell embedding learning and Cell Clustering; (4) Trajectory Inference; (5) Gene Regulatory Network Inference; (6) Cell-Cell Interaction inference (7) Single-cell Spatial data modelling (8) Multi-omics Data Integration. These machine learning methods could be categorized into three significant learning challenges: (1) Representation: how to infer good embeddings for representing the cells and subsequent analyses (2) Inference: how to infer the probability for a particular cellular event given the model. (3) Learning: Learn the model (structure and parameters) that could represent the probability density of cells (e.g., in terms of gene expression). Each single-cell data analysis task could span one or multiple learning challenges. Therefore, we will organize the lectures into different single-cell data analysis tasks, and each task will be discussed based on the corresponding learning challenges.

All the single-cell machine learning methods covered in this course will be discussed in the context of "real-world" biomedical applications. For each method, we will provide real-world examples and hands-on datasets for students to learn and practice the methods learned in the class.

Course Schedule

WeekDateDescriptionCourse materialsAssignments due
1 Aug.30 Brief Introduction of Single-cell Genomics Data https://www.nature.com/articles/nrg.2015.16 Sept.6
2 Sept.6 Comparison of Different Single-cell Sequencing Platforms https://www.nature.com/articles/s41587-020-00748-9
https://satijalab.org/costpercell/
https://satijalab.org/howmanycells/
Sept.13

Course Schedule

WeekDateDescriptionCourse materialsAssignments due
1 Aug.30 Brief Introduction of Single-cell Genomics Data https://www.nature.com/articles/nrg.2015.16 Sept.6
2 Sept.6 Comparison of Different Single-cell Sequencing Platforms https://www.nature.com/articles/s41587-020-00748-9
https://satijalab.org/costpercell/
https://satijalab.org/howmanycells/
Sept.13
3 Sept.13 Quantify Single-cell Gene Expression from Single-cell RNA-seq https://www.10xgenomics.com/support/single-cell-gene-expression
https://pubmed.ncbi.nlm.nih.gov/24385147/
Sept.20
4 Sept.20 Standard Single-cell RNA-seq Analysis Pipeline Overview https://scanpy.readthedocs.io/en/stable/
https://www.kallistobus.tools/tutorials/scrna-seq_intro/python/scrna-seq_intro/
https://www.embopress.org/doi/full/10.15252/msb.20188746
Sept.27
5 Sept.27 Classical Single-cell Dimensionality Reduction and Visualization methods https://www.frontiersin.org/articles/10.3389/fgene.2021.646936/full
http://www.cs.toronto.edu/~hinton/absps/tsne.pdf
https://arxiv.org/pdf/1802.03426.pdf
Oct.4
6 Oct.4 Single-cell data denoising, imputation, and argumentation https://www.nature.com/articles/s41467-018-07931-2
https://doi.org/10.1016/j.cell.2018.05.061
Oct.11
7 Oct.11 No class Fall reading back
8 Oct.18 Mid-term Mid-term Exam (2 hrs)
9 Oct.25 Classical Methods for Single-cell Data Clustering https://www.nature.com/articles/s41598-019-41695-z/
https://cole-trapnell-lab.github.io/monocle3/docs/clustering/
https://cole-trapnell-lab.github.io/monocle3/docs/clustering/
https://scanpy-tutorials.readthedocs.io/en/latest/pbmc3k.html
Nov.1st
10 Nov.1st Deep-learning based Methods for Single-cell Data Dimension Reduction and Clustering https://jaan.io/what-is-variational-autoencoder-vae-tutorial/
https://www.nature.com/articles/s41592-018-0229-2
https://pubmed.ncbi.nlm.nih.gov/36036832/
Nov.8
11 Nov.8 Computational Methods for Cell-type Annotation of Single-cell clusters http://bioconductor.org/books/3.13/OSCA.basic/cell-type-annotation.html
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1795-z
https://www.nature.com/articles/s41467-022-29744-0
Nov.15
12 Nov.15 Single-cell Trajectory Inference Methods https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5848617/
https://www.nature.com/articles/s41586-019-0969-x
Nov.22
13 Nov.22 Computational Models for Cell-cell Interaction Inference https://www.nature.com/articles/s41596-020-0292-x
https://www.nature.com/articles/s41467-021-21246-9
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02629-7
Nov.29
13 Nov.29 Multimodal Methods for Single-cell Data Analytics https://www.nature.com/articles/s41592-021-01264-7
https://www.sciencedirect.com/science/article/pii/S0092867421005833
https://www.nature.com/articles/s41467-021-22197-x
Dec.6
14 Final Course project code and report Due Dec.14

Evaluation

Name of assignment or examDue date% of final grade
Mid-termWeek 830%
Homework assignmentWeekly30%
Final course projectWeek1440%