Analysis of single cell RNA-seq data
2019-10-03
1 About the course
Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing (scRNA-seq). The cellular resolution and the genome-wide scope of scRNA-seq makes it possible to address issues that are intractable using other methods like bulk RNA-seq or single-cell RT-qPCR. However, scRNA-seq data poses many challenges due to the scale and complexity of scRNA-seq datasets, with novel methods often required to account for the particular characteristics of the data.
In this course we will discuss some of the questions that can be addressed using scRNA-seq as well as the available computational and statistical methods. We will cover key features of the technology platforms and fundamental principles of scRNA-seq data analysis that are transferable across technologies and analysis workflows. The number of computational tools is already vast and increasing rapidly, so we provide hands-on workflows using some of our favourite tools on carefully selected, biologically-relevant example datasets.
Across two days, attendees can expect to gain an understanding of approaches to and practical analysis experience on: quality control, data normalisation, visualisation, clustering, trajectory (pseudotime) inference, differential expression, batch correction and data integration.
Course outline:
Day 1:
- Morning session 1: Workshop overview; introduction to scRNA-seq; pre-processing scRNA-seq data
- Morning session 2: Quality control, visualisation and exploratory data analysis
- Afternoon session 1: Normalisation, confounders and batch correction
- Afternoon session 2: Latent spaces, clustering and cell annotation
Day 2:
- Morning session 1: Trajectory inference
- Morning session 2: Differential expression; data imputation
- Afternoon session 1: Combining datasets and data integration
- Afternoon session 2: Case studies
This course has been adapted from a course taught through the University of Cambridge Bioinformatics training unit, but the material is meant for anyone interested in learning about computational analysis of scRNA-seq data and is updated roughly twice per year.
The number of computational tools is increasing rapidly and we are doing our best to keep up to date with what is available. One of the main constraints for this course is that we would like to use tools that are implemented in R and that run reasonably fast. Moreover, we will also confess to being somewhat biased towards methods that have been developed either by us or by our friends and colleagues.
1.1 Web page
The html version of the workshop material is available at the following link:
https://biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/index.html
1.2 GitLab
The source code and materials for the course are available at the SVI Bioinformatics and Cellular Genomics Lab’s GitLab:
https://gitlab.svi.edu.au/biocellgen-public/mig_2019_scrnaseq-workshop
1.3 Video
This video was recorded during the course (2 days) in May 2019 in Cambridge, UK. This recorded version of the course differs slightly from the version in this document.
1.3.1 Day 1
1.3.2 Day 2
1.4 Docker image
The course can be reproduced without any package installation by running the course docker image which contains all the required packages.
Workshop Docker Repository on DockerHub
1.4.1 Run the image
Make sure Docker is installed on your system. If not, please follow these instructions. To run the course docker image (use the latest version):
docker run -p 8888:8888 -e PASSWORD="jupyter" svibiocellgen/mig_2019_scrnaseq-workshop:v1.01
Then follow the instructions provided, e.g.:
To access the notebook, open this file in a browser:
file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
Or copy and paste one of these URLs:
http://(a9ee1aad5398 or 127.0.0.1):8888/?token=22debff49d9aae3c50e6b0b5241b47eefb1b8f883fcb7e6d
A Jupyter session will be open in a web browser (we recommend Chrome).
1.4.1.1 Windows users
On Windows operating system the IP address of the container can be different
from 127.0.0.1
(localhost
). To find the IP address please run:
docker-machine ip default
1.4.2 Download data/other files
1.4.2.1 Download from AWS (within Docker)
Recommended if you are using Docker
In the Jupyter session, please click on New -> Terminal
. In the new terminal
window please run:
./poststart.sh
1.4.2.2 Manual download from AWS
If you want to download data files from AWS outside of Docker image you can
still use the same poststart.sh
script but you will need to install AWS
CLI on
your computer.
Alternatively, you can browse and download the files in you web-browser by visiting this link.
NB: Only the core datasets (i.e. not Tabula Muris) are available from AWS storage.
1.4.2.3 Manual download from SVI
Recommended if you are using your own computer
For simplicity, we have also hosted the core datasets used in the course and a subset of the Tabula Muris data on SVI websites. There are two files to download, both “tarballs”, i.e. compressed archives of multiple folders and files.
1.4.2.3.1 Core datasets
To download the core datasets, click this link (195Mb).
It is most convenient to download the tarball to the head directory for the
course. We then want to unpack the tarball and move it to a directory called
data
in the head directory of the repository.
To do this at the command line:
wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/mig-sc-workshop-2019-data.tar.gz
mkdir workshop-data
tar -xvf mig-sc-workshop-2019-data.tar.gz --directory workshop-data
mv workshop-data/mnt/mcfiles/Datasets/MIG_2019_scRNAseq-workshop/data ./
rm -r workshop-data
[This requires a little bit of faff to get all of the directory paths correct and then tidy updated.]
Alternatively, if you are working on your laptop, unpack the tarball using the
default method on your system (usually a double click on the *.tar.gz
file
will do the trick) and drag and drop the data
folder to the workshop directory.
1.4.2.3.2 Tabula Muris
To download the Tabula Muris data, clink this link (655Mb).
We then go through a similar process as described above to unpack the tarball.
wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/Tabula_Muris.tar.gz
tar -xvf Tabula_Muris.tar.gz
mv mnt/mcfiles/Datasets/Tabula_Muris data
rm -r mnt
1.4.2.3.3 Desired results
The data folder then should contain both the core datasets and the Tabula Muris data, and have the following structure:
data
├── 10cells_barcodes.txt
├── 2000_reference.transcripts.fa
├── deng
│ └── deng-reads.rds
├── droplet_id_example_per_barcode.txt.gz
├── droplet_id_example_truth.gz
├── EXAMPLE.cram
├── pancreas
│ ├── muraro.rds
│ └── segerstolpe.rds
├── pbmc3k_filtered_gene_bc_matrices
│ └── hg19
│ ├── barcodes.tsv
│ ├── genes.tsv
│ └── matrix.mtx
├── sce
│ ├── Heart_10X.rds
│ └── Thymus_10X.rds
├── Tabula_Muris
│ ├── droplet
│ │ ├── droplet
│ │ ├── droplet_annotation.csv
│ │ └── droplet_metadata.csv
│ └── FACS_smartseq2
│ ├── FACS
│ ├── FACS_annotations.csv
│ └── FACS_metadata.csv
└── tung
├── annotation.txt
├── molecules.txt
├── reads.txt
├── TNs.txt
└── TPs.txt
11 directories, 22 files
With the files in these locations, everything is set up to run the code as presented in the RMarkdown files in the workshop.
1.4.3 RStudio
Now go back to Jupyter browser tab and change word tree
in the url to
rstudio
. RStudio server will open with all of the course files, software and
the data folder available.
1.5 Manual installation
If you are not using a docker image of the course, then to be able to run all
code chunks of the course you need to clone or download the course GitHub
repository and start an R
session in the course_files
folder. You will also need to install all required
packages manually. We are using Bioconductor version 3.9 packages in this
version of the course.
The install.R
file in the workshop repository provides the necessary commands
for installing all of the required packages. You can run this script from the
command line with Rscript install.R
or copy-and-paste the commands into an R
session and run them interactively.
Alternatively, you can just install packages listed in a chapter of interest.
1.6 Citation
This version of the workshop has been updated by Davis J. McCarthy, Ruqian Lyu and PuXue Qiao, based on the 2019-07-01 version of the course:
- Ruqian Lyu, PuXue Qiao, Vladimir Kiselev, Tallulah Andrews, Jennifer Westoby, Maren Büttner, Jimmy Lee, Krzysztof Polanski, Sebastian Y. Müller, Elo Madissoon, Stephane Ballereau, Maria Do Nascimento Lopes Primo, Rocio Martinez Nunez, Martin Hemberg and Davis J. McCarthy, (2019), “Analysis of single cell RNA-seq data”, https://scrnaseq-course.cog.sanger.ac.uk/website/index.html
1.7 License
All of the course material is licensed under GPL-3. Anyone is welcome to go through the material in order to learn about analysis of scRNA-seq data. If you plan to use the material for your own teaching, we would appreciate if you tell us about it in addition to providing a suitable citation.
1.8 Prerequisites
The course is intended for those who have basic familiarity with Unix and the R statistical language.
We will also assume that you are familiar with mapping and analysing bulk RNA-seq data as well as with the commonly available computational tools.
We recommend attending the Introduction to RNA-seq and ChIP-seq data analysis or the Analysis of high-throughput sequencing data with Bioconductor before attending this course.
1.9 Contact
If you have any comments, questions or suggestions about the material, please contact Davis McCarthy.