Bioinformatics for T-Cell immunology
Table of contents
Overview
This site intends to be a repository for the group projects given during the course Bioinformatics for T-Cell immunology, 11-15/07/2022, at EMBL-EBI, Cambridge, UK.
The projects use publicly available and realist data sets of T cells to perform common tasks in the analysis of single-cell-RNA-seq data: integration, clustering and differential gene expression. The projects focus more on the demonstration of key methodological aspects on the analysis of single-cell data rather than addressing a particular biological question(s).
Projects
This repository hosts three standalone/independent projects:
-
Integration of single-cell data from patients developing arthritis arAE under ICI
-
main goal: comparing the integration of scRNA-seq data by batch with Seurat RPCA (Hao et al., 2021) versus Scanorama (Hie et al., 2019) using the cell type annotations as ‘ground-truth’.
-
learning objectives: preprocessing/filtering, normalization, finding HVG, scaling, dimensional reduction, clustering and integration of single-cell data with
Seurat
. Integration of single-cell data withScanorama
. -
publication: Kim et al., 2022
-
data: GEO GSE173303
-
R markdown notebook:
01_integration_arthritis_arAE_ICI.Rmd
-
vignette: 01_integration_arthritis_arAE_ICI.html
-
results (Zenodo): https://zenodo.org/record/6807707/files/GSE173303.tar.gz
-
estimated computing time: 00:27:09
-
estimated memory: 21.61 GB
-
-
Fine-grained clustering of single-cell data of melanoma immune/stroma cells
-
main goal: comparing the clustering results of scRNA-seq data obtained with graph-clustering SNN plus Louvain algorithms implemented in Seurat (Hao et al., 2021) versus Iterative Clustering Projection algorithm implemented in ILoReg (Smolander et al., 2021).
-
learning objectives: clustering single-cell data with the graph-based clustering method implemented in
Seurat
(SNN plus Louvain algorithms) and with the Iterative Clustering Projection algorithm implemented inILoReg
. -
publication: Jerby-Arnon et al., 2018
-
data: GEO GSE115978
-
R markdown notebook:
02_clustering_seurat_vs_iloreg.Rmd
-
vignette: 02_clustering_seurat_vs_iloreg.html
-
results (Zenodo): https://zenodo.org/record/6807707/files/GSE115978.tar.gz
-
estimated computing time: 01:27:26
-
estimated memory (using 10 threads): 5.61 GB
-
-
Differential gene expression of stimulated CD4+ T single-cell data with single-cell and pseudobulk methods
-
main goal: comparing differential genes expression results obtained with single-cell (Wilcox implemented in Seurat - Hao et al., 2021) versus pseudobulk (ROTS - Suomi et al., 2017) methods against the DGE results of the ‘ground-truth’ bulk-RNA-seq data.
-
learning objectives: proprities of single-cell data versus bulk. Differential gene expression with single-cell versus pseudobulks.
-
publication: Cano-Gamez et al., 2020
-
data: www.opentargets.org
-
R markdown notebook:
03_pseudobulks_dge_rots_cd4_act.Rmd
-
vignette: 03_pseudobulks_dge_rots_cd4_act.html
-
results (Zenodo): https://zenodo.org/record/6807707/files/CanoGamez_et_al_2020.tar.gz
-
estimated computing time: 02:09:24
-
estimated memory: 11.27 GB
-
The course material for these projects can be found in the following github repository (under the folder projects
): https://github.com/elolab/Bioinfo_Tcell_projects_22
Download the github repository by typing in the terminal:
git clone https://github.com/elolab/Bioinfo_Tcell_projects_22.git
or by clicking under the Download ZIP icon (decompress the folder).
The README.md
markdown text file under the folder projects
explains the directory structure. The project notebooks are under the folder reports
.
The conda
environment yaml file at projects/workflow/envs/tools.yaml
describes the list of software packages and the respective versions required to reproduce the project notebooks. Such can be installed with conda
(or mamba
) by doing (from the root
directory projects
folder): conda env create -f workflow/envs/tools.yaml
(you may need to add the tag -name: course
to the beginning of the yaml file).
Outline
Each group just pick one of the projects.
The timeline for one project is highlighted below:
-
30min for project introduction on day 2
-
1.5h for group project work on day 3
-
1.5h for group project work on day 4
-
1.5 h for group project work and wrap-up on day 5
-
1h for the group presentation (all groups) on day 5
Target audience
Scientists who want to learn key concepts revolving around the analysis of single-cell-RNA-seq data such as: integration, clustering and differential gene expression.
Pre-requisites
The course projects are delivered as R markdown notebooks which can be reproduced with basic-level knowledge of R programming language. There are a few lines of python too being called directly from R
using reticulate. The participants may benefit from medium-level knowledge of R
to explore more in-depth some analyses and familiarity with Seurat and SingleCellExperiment objects and functionality.
Project lead
António Sousa (ENLIGHT-TEN+ PhD student at the Medical Bioinformatics Centre, TBC, University of Turku & Åbo Akademi)
Contact: aggode@utu.fi
Disclaimer
All the data used along each project notebook was made public elsewhere by the respective authors and it has been properly referenced in each project (proper links were provided along each project notebook). The data and tools chosen to address the topic(s) of each project notebook reflect only my personal experience/knowledge and they were chosen to highlight particular aspects that I consider important. The results generated and explored within each project notebook have just the general purpose of give a brief introduction to the topics addressed in each project and do not aim, at any point, to reproduce or question neither the approaches taken nor the main findings published along with the data sets used herein.
Acknowledgements
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.: 955321
This work is licensed under a Creative Commons Attribution 4.0 International License.