If you have some time on your hands during “lockdown” what better way is there to make use of it than by learning bioinformatics?
In the single cell field especially, large amounts of data are produced but bioinformaticians are scarce. Therefore, it is an important and much sought-after skill for biologists to be able take data into their own hands. Luckily, there have been a range of tools developed that allow even data analysis noobs to get to grips with their single cell data. For a lot of us the obvious and easiest answer will be to use some form of guide user interface (GUI) such as those provided by companies such as Partek (watch this webinar to learn more) that enables us to go from raw data all the way to visualization.
Although convenient, options offered for customization of analysis tools and plot appearance in GUI are somewhat limited. Combining dropSeqPipe (dSP) for pre-processing with Seurat for post-processing offers full control over data analysis and visualization. The dSP pipeline with all its tools is designed to provide a reproducible, almost automatic, workflow that goes from raw reads (FASQ files) to basic data visualization. This is the point at which a specific experimental design requires manual intervention, for instance, when generating graphs. This is usually the exciting bit and it cannot be automated as requirements are often specific to a researcher’s needs.
dSP produces output that is tailored for a quasi-standard data visualization software in the single-cell world called Seurat and Scater. Seurat and Scater are package that can be used with the programming language R (learn some basic R here) enabling QC, analysis, and exploration of single-cell RNA-seq data. To help you get started with your very own dive into single cell and single nuclei RNA-Seq data analysis we compiled a tutorial on post-processing of data with R using Seurat tools from the famous Satija lab.
What will you need?
- An installed R and R studio package
- A Seurat object from one of your scRNA-Seq or sNuc-Seq projects.
- A computer…but that probably goes without saying
Don’t have any of this? (Well hopefully you’ll have the computer…we can’t help very much with that) but otherwise don’t you worry, you can find a detailed step by step introduction below on how to install R and R studio and we have placed a Seurat object here ready for you to download and play with.
Ticking all the boxes? Great! You can go straight to step 1: Installing relevant packages.
Disclaimer: This is for absolute beginners, if you are comfortable working with R and Seurat objects, I would suggest going to the Satija lab webpage straight away.
If you have never used R, have a quick read of this introduction which familiarizes you with the most basic features of the program. There is plethora of analysis types that can be done with R and it is a very good skill to have! Intrigued? To learn more about R read this in depth guide to R by Nathaniel D. Phillips.
Of course, you could write all your code in the console, however. none of that would be saved. Best practice is to save it in a script that will allow you to access it again once a new data set comes your way. Generally speaking, an R script is just a bunch of R code in a single file.
Writing a new script
To start writing a new R script in RStudio, click File – New File – R Script.
Execution of commands
If you would like to execute one of the commands in the script, just highlight the command and press Ctrl + Enter. You will see it appearing in the Console window.
Anything starting with a # is a comment, meaning that even if executed in the command line it won’t be read by R. It is simply helpful for the user to explain the purpose of the command that is written below.
2. Global environment
This is where R stores all the objects and variables created during a session. In the same location you can also find “History”, which lists all the commands executed during a session.
This is the window in which you can type R commands, execute them and view the results (except plots). You can find some information on how to make your work with R more productive here.
4. Plots/ Help
This is the window in which R will print the plots generated and open the help tab if in the console
?function is executed.
Downloading R and R studio
Start with installing R and R-Studio on your computer. Downloads for Windows and macOS can be found in the links below, install both files and run R studio. R will provide you with the necessary software to write and execute R commands, R studio is helpful as it provides a nice graphical interface for the daily use of R.
R Studio download
Downloading a Seurat object
You can find a Seurat object here, which is some mouse lung scRNA-Seq from Nadia data for you to play with.
Installing relevant packages
When you first open R Studio it will pretty much be a blank page. Below are some packages that you will need to install to be able to use the code presented in this tutorial. This only needs to be done once after R is installed.
Note!: All code must be entered in the window labelled
Console. Copy past the code at the
> prompt and press enter, this will start the installation of the packages below. You will know that the script is completed if R displays a fresh
> prompt in the console.
install.packages('BiocManager') BiocManager::install('multtest') install.packages('Seurat') install.packages('dplyr')
Note: After installing
BiocManager::install('multtest') R will ask to
Update all/some/none? [a/s/n]: enter
n to not update other packages.
Defining the working directory
In order for R to find your Seurat object you will need to tell the program where it is saved, this location is called your working directory. This step will show you how to set this directory.
# Note you can copy the path from windows however you will have to change all \ to / setwd("insert path")
Loading the relevant libraries
This step will install required packages and load relevant libraries for data analysis and visualization.
Note!: Libraries need to be loaded every time R is started
library(Seurat) library(ggplot2) library(dplyr)
Loading the Seurat object into R global environment
Its good practice to save every data set that is uploaded into R under a specific name (variable) in the global environment in R. This will allow you to transform or visualize that data simply by calling its’ variable. This is also true for the Seurat object when it is first loaded into R.
Note!: The Seurat object file must be saved in the working directory defined above, or else R won’t be able to find it.
#This loads the Seurat object into R and saves it in a variable called ‘seuratobj’ in the global environment seuratobj <- readRDS("R_Seurat_objects_umap.rds")
Extracting the meta data from the Seurat object
A Seurat object contains a lot of information including the count data and experimental meta data. The count data is saved as a so-called matrix within the seurat object, whereas, the meta data is saved as a data frame (something like a table). Meta data stores values such as numbers of genes and UMIs and cluster numbers for each cell (barcode). Just like with the Seurat object itself we can extract and save this data frame under a variable in the global environment.
#Saves the data frame meta data in a variable called ‘meta.data’ in the global environment meta.data <- firstname.lastname@example.org
Data frames are standard data types in R and there is a lot we can do with it. To learn more on what to do with data frames, have look here
#This will show you the first 7 lines of your data frame head(meta.data)
orig.ident nCount_RNA nFeature_RNA cellNames samples barcode expected_cells read_length CT2_S249_AAAACTCACC CT2_S249 784 628 CT2_S249_AAAACTCACC CT2_S249 AAAACTCACC 600 150 CT2_S249_AAAAGGTGTG CT2_S249 1747 1114 CT2_S249_AAAAGGTGTG CT2_S249 AAAAGGTGTG 600 150 CT2_S249_AAAATCAATC CT2_S249 3906 2134 CT2_S249_AAAATCAATC CT2_S249 AAAATCAATC 600 150 CT2_S249_AAAATTCTTA CT2_S249 3344 1799 CT2_S249_AAAATTCTTA CT2_S249 AAAATTCTTA 600 150 CT2_S249_AAACAAAGGC CT2_S249 5323 2732 CT2_S249_AAACAAAGGC CT2_S249 AAACAAAGGC 600 150 CT2_S249_AAACGACCGA CT2_S249 906 658 CT2_S249_AAACGACCGA CT2_S249 AAACGACCGA 600 150 batch nCounts pct.sribo pct.lribo pct.Ribo pct.mito top50 umi.per.gene RNA_snn_res.0.8 CT2_S249_AAAACTCACC CT2 24665 0.02933673 0.01658163 0.04591837 0.005102041 0.1964286 1.248408 9 CT2_S249_AAAAGGTGTG CT2 85066 0.05266171 0.04121351 0.09387521 0.024613623 0.2329708 1.568223 5 CT2_S249_AAAATCAATC CT2 208598 0.03763441 0.04121864 0.07885305 0.012032770 0.1738351 1.830366 5 CT2_S249_AAAATTCTTA CT2 187974 0.02601675 0.03887560 0.06489234 0.018540670 0.2221890 1.858810 5 CT2_S249_AAACAAAGGC CT2 132907 0.02423445 0.02141649 0.04565095 0.009956791 0.1666354 1.948389 9 CT2_S249_AAACGACCGA CT2 61304 0.02759382 0.04856512 0.07615894 0.014348786 0.2240618 1.376900 3 seurat_clusters CT2_S249_AAAACTCACC 9 CT2_S249_AAAAGGTGTG 5 CT2_S249_AAAATCAATC 5 CT2_S249_AAAATTCTTA 5 CT2_S249_AAACAAAGGC 9 CT2_S249_AAACGACCGA 3
It is usually a good idea to play around and inspect the data, you can for example try
View(meta.data). The example below allows you to check which samples are stored in the Seurat object.
# Show samples stored in the data frame # orig.ident contains sample names unique(meta.data$orig.ident)
 CT2_S249 Levels: CT2_S249
Step4 – Data QC
Before starting to dive deeper into your data its beneficial to take some time for selection and filtration of cells based on some QC metrics. This can be easily done with Seurat looking at common QC metrics such as:
- The number of unique genes/ UMIs detected in each cell.
- The percentage mitochondrial/ ribosomal reads per cell
Read more to this topic here under “Standard pre-processing workflow”.
#Creates a violin plot for the number of UMIs ('nFeature_RNA'), the number of genes ('nCount_RNA'), % ribosomal RNA (‘pct.Ribo’) and % mitochondrial RNA (’pct.mito’) for each sample VlnPlot(object = seuratobj, features = c('nFeature_RNA','nCount_RNA','pct.Ribo','pct.mito'), group.by = 'orig.ident',pt.size = 0,ncol=2)
# FeatureScatter can be used to visualize feature-feature relationships such as number of genes ("nFeature_RNA") vs number of UMIs ("nCount_RNA") FeatureScatter(seuratobj, feature1 = "nCount_RNA", feature2 = "nFeature_RNA",group.by = 'orig.ident')
Step 5 – Visualization of marker gene expression
Finding marker genes
# Finds all markers of cluster 1 cluster1.markers <- FindMarkers(seuratobj, ident.1 = 1, min.pct = 0.25)
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=07s
# Shows top 5 markers genes of cluster 1 head(cluster1.markers, n = 5)
p_val avg_logFC pct.1 pct.2 p_val_adj Cntn1 1.918870e-07 2.3485859 0.3 0.026 0.004927659 Sftpa1 2.927622e-04 1.7821664 0.5 0.141 1.000000000 Tmsb10 3.797597e-04 -2.9132579 0.0 0.637 1.000000000 B2m 3.827411e-04 -0.9140352 0.1 0.831 1.000000000 Tpm3 5.451834e-04 -2.3310006 0.0 0.617 1.000000000
Define set of marker genes
In order to create dot plots, heat maps or feature plots a list of genes of interests (features) need to be defined.
features <- c("Scgb1a1","Cd79b","Mzb1","Plac8","Trbc2","Cd8b1","Cd4","Cd3e","Cd3d","Tmem100","Vwf","Ednrb","Pecam1","Col1a2","Inmt","Dcn","Itgam","Itgax","C1qb","Sftpd","Sftpb","Sftpc","Foxj1","Scgb3a2","Ear2","Ager","Hopx","Pdpn","Rtkn2","Igfbp2")
DoHeatmap(object = seuratobj, features = features)
DotPlot(seuratobj, features = features) + RotatedAxis()
Highlight marker gene expression in dimension reduction plot such as UMAP or tSNE.
To reduce computing time we only select a few features
#selected marker genes for cell type features <- c("Cd8b1","Trbc2","Ly6c2","Cd4") #UMAP feature plot colour coded by defined feature FeaturePlot(seuratobj, features = features,reduction = "umap")
Step 6 – Dimension reduction plots
The goal of dimension reduction plots is to visualize single cell data by placing similar cells in close proximity in a low-dimensional space. Seurat offers non-linear dimension reduction techniques such as UMAP and tSNE.
DimPlot(seuratobj, dims=c(1,2), reduction = "umap",label=TRUE)
DimPlot(seuratobj, reduction = "tsne")
Many more visualization option for your data can be found under vignettes on the Satija lab website.
This vignette is very useful if you are trying to compare two conditions.
We hope this tutorial was useful to you and that it will enable to you to take data into your own hands. Should you have any questions you can contact us under email@example.com .