1 About the course

Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing (scRNA-seq). The cellular resolution and the genome-wide scope of scRNA-seq makes it possible to address issues that are intractable using other methods like bulk RNA-seq or single-cell RT-qPCR. However, scRNA-seq data poses many challenges due to the scale and complexity of scRNA-seq datasets, with novel methods often required to account for the particular characteristics of the data.

In this course we will discuss some of the questions that can be addressed using scRNA-seq as well as the available computational and statistical methods. We will cover key features of the technology platforms and fundamental principles of scRNA-seq data analysis that are transferable across technologies and analysis workflows. The number of computational tools is already vast and increasing rapidly, so we provide hands-on workflows using some of our favourite tools on carefully selected, biologically-relevant example datasets.

Across two days, attendees can expect to gain an understanding of approaches to and practical analysis experience on: quality control, data normalisation, visualisation, clustering, trajectory (pseudotime) inference, differential expression, batch correction and data integration.

Course outline:

  • Day 1:

    • Morning session 1: Workshop overview; introduction to scRNA-seq; pre-processing scRNA-seq data
    • Morning session 2: Quality control, visualisation and exploratory data analysis
    • Afternoon session 1: Normalisation, confounders and batch correction
    • Afternoon session 2: Latent spaces, clustering and cell annotation
  • Day 2:

    • Morning session 1: Trajectory inference
    • Morning session 2: Differential expression; data imputation
    • Afternoon session 1: Combining datasets and data integration
    • Afternoon session 2: Case studies

This course has been adapted from a course taught through the University of Cambridge Bioinformatics training unit, but the material is meant for anyone interested in learning about computational analysis of scRNA-seq data and is updated roughly twice per year.

The number of computational tools is increasing rapidly and we are doing our best to keep up to date with what is available. One of the main constraints for this course is that we would like to use tools that are implemented in R and that run reasonably fast. Moreover, we will also confess to being somewhat biased towards methods that have been developed either by us or by our friends and colleagues.

1.1 Web page

The html version of the workshop material is available at the following link:

https://biocellgen-public.svi.edu.au/mig_2019_scrnaseq-workshop/public/index.html

1.2 GitLab

The source code and materials for the course are available at the SVI Bioinformatics and Cellular Genomics Lab’s GitLab:

https://gitlab.svi.edu.au/biocellgen-public/mig_2019_scrnaseq-workshop

1.3 Video

This video was recorded during the course (2 days) in May 2019 in Cambridge, UK. This recorded version of the course differs slightly from the version in this document.

1.3.1 Day 1

1.3.2 Day 2

1.4 Docker image

The course can be reproduced without any package installation by running the course docker image which contains all the required packages.

Workshop Docker Repository on DockerHub

1.4.1 Run the image

Make sure Docker is installed on your system. If not, please follow these instructions. To run the course docker image (use the latest version):

docker run -p 8888:8888 -e PASSWORD="jupyter" svibiocellgen/mig_2019_scrnaseq-workshop:v1.01

Then follow the instructions provided, e.g.:

To access the notebook, open this file in a browser:
    file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
Or copy and paste one of these URLs:
    http://(a9ee1aad5398 or 127.0.0.1):8888/?token=22debff49d9aae3c50e6b0b5241b47eefb1b8f883fcb7e6d

A Jupyter session will be open in a web browser (we recommend Chrome).

1.4.1.1 Windows users

On Windows operating system the IP address of the container can be different from 127.0.0.1 (localhost). To find the IP address please run:

docker-machine ip default

1.4.2 Download data/other files

1.4.2.1 Download from AWS (within Docker)

Recommended if you are using Docker

In the Jupyter session, please click on New -> Terminal. In the new terminal window please run:

./poststart.sh

1.4.2.2 Manual download from AWS

If you want to download data files from AWS outside of Docker image you can still use the same poststart.sh script but you will need to install AWS CLI on your computer.

Alternatively, you can browse and download the files in you web-browser by visiting this link.

NB: Only the core datasets (i.e. not Tabula Muris) are available from AWS storage.

1.4.2.3 Manual download from SVI

Recommended if you are using your own computer

For simplicity, we have also hosted the core datasets used in the course and a subset of the Tabula Muris data on SVI websites. There are two files to download, both “tarballs”, i.e. compressed archives of multiple folders and files.

1.4.2.3.1 Core datasets

To download the core datasets, click this link (195Mb).

It is most convenient to download the tarball to the head directory for the course. We then want to unpack the tarball and move it to a directory called data in the head directory of the repository.

To do this at the command line:

wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/mig-sc-workshop-2019-data.tar.gz
mkdir workshop-data
tar -xvf mig-sc-workshop-2019-data.tar.gz --directory workshop-data
mv workshop-data/mnt/mcfiles/Datasets/MIG_2019_scRNAseq-workshop/data ./
rm -r workshop-data

[This requires a little bit of faff to get all of the directory paths correct and then tidy updated.]

Alternatively, if you are working on your laptop, unpack the tarball using the default method on your system (usually a double click on the *.tar.gz file will do the trick) and drag and drop the data folder to the workshop directory.

1.4.2.3.2 Tabula Muris

To download the Tabula Muris data, clink this link (655Mb).

We then go through a similar process as described above to unpack the tarball.

wget https://www.svi.edu.au/MIG_2019_scRNAseq-workshop/Tabula_Muris.tar.gz
tar -xvf Tabula_Muris.tar.gz
mv mnt/mcfiles/Datasets/Tabula_Muris data
rm -r mnt
1.4.2.3.3 Desired results

The data folder then should contain both the core datasets and the Tabula Muris data, and have the following structure:

data
├── 10cells_barcodes.txt
├── 2000_reference.transcripts.fa
├── deng
│   └── deng-reads.rds
├── droplet_id_example_per_barcode.txt.gz
├── droplet_id_example_truth.gz
├── EXAMPLE.cram
├── pancreas
│   ├── muraro.rds
│   └── segerstolpe.rds
├── pbmc3k_filtered_gene_bc_matrices
│   └── hg19
│       ├── barcodes.tsv
│       ├── genes.tsv
│       └── matrix.mtx
├── sce
│   ├── Heart_10X.rds
│   └── Thymus_10X.rds
├── Tabula_Muris
│   ├── droplet
│   │   ├── droplet
│   │   ├── droplet_annotation.csv
│   │   └── droplet_metadata.csv
│   └── FACS_smartseq2
│       ├── FACS
│       ├── FACS_annotations.csv
│       └── FACS_metadata.csv
└── tung
    ├── annotation.txt
    ├── molecules.txt
    ├── reads.txt
    ├── TNs.txt
    └── TPs.txt

11 directories, 22 files

With the files in these locations, everything is set up to run the code as presented in the RMarkdown files in the workshop.

1.4.3 RStudio

Now go back to Jupyter browser tab and change word tree in the url to rstudio. RStudio server will open with all of the course files, software and the data folder available.

1.5 Manual installation

If you are not using a docker image of the course, then to be able to run all code chunks of the course you need to clone or download the course GitHub repository and start an R session in the course_files folder. You will also need to install all required packages manually. We are using Bioconductor version 3.9 packages in this version of the course.

The install.R file in the workshop repository provides the necessary commands for installing all of the required packages. You can run this script from the command line with Rscript install.R or copy-and-paste the commands into an R session and run them interactively.

Alternatively, you can just install packages listed in a chapter of interest.

1.6 Citation

This version of the workshop has been updated by Davis J. McCarthy, Ruqian Lyu and PuXue Qiao, based on the 2019-07-01 version of the course:

  • Ruqian Lyu, PuXue Qiao, Vladimir Kiselev, Tallulah Andrews, Jennifer Westoby, Maren Büttner, Jimmy Lee, Krzysztof Polanski, Sebastian Y. Müller, Elo Madissoon, Stephane Ballereau, Maria Do Nascimento Lopes Primo, Rocio Martinez Nunez, Martin Hemberg and Davis J. McCarthy, (2019), “Analysis of single cell RNA-seq data”, https://scrnaseq-course.cog.sanger.ac.uk/website/index.html

1.7 License

All of the course material is licensed under GPL-3. Anyone is welcome to go through the material in order to learn about analysis of scRNA-seq data. If you plan to use the material for your own teaching, we would appreciate if you tell us about it in addition to providing a suitable citation.

1.8 Prerequisites

The course is intended for those who have basic familiarity with Unix and the R statistical language.

We will also assume that you are familiar with mapping and analysing bulk RNA-seq data as well as with the commonly available computational tools.

We recommend attending the Introduction to RNA-seq and ChIP-seq data analysis or the Analysis of high-throughput sequencing data with Bioconductor before attending this course.

1.9 Contact

If you have any comments, questions or suggestions about the material, please contact Davis McCarthy.