Integration task

This single-cell RNA-seq integration task describes an example of an integration analysis involving different biological conditions: resting (named CTRL) versus interferon stimulated (named STIM) human peripheral blood mononuclear cells (PBMCs). The data set with cells coming from both biological conditions was retrieved from the SeuratData (v.0.2.2.9001):

ifnb (resting and ifnb stimulated data sets):
- ifnb SeuratData data set (v.3.1.0): IFNB-Stimulated and Control PBMCs
- no. of cells: 13,999 cells (6,548 in CTRL and 7,451 in STIM)

The identity of the data set - CTRL or STIM - for every cell was saved in the Seurat meta.data column variable stim. The ground-truth cell identities were also provided in the column variable seurat_annotations but avoid checking them until the end of this notebook to make these analyses more interesting.

The analyses performed in this notebook rely in the Seurat R package (v.5.1.0).

Import the main packages used in this notebook: Seurat (v.5.1.0), SeuratWrappers (v.0.3.2 - integration wrappers for Seurat), dplyr (v.1.1.4 - wrangling data), patchwork (v.1.2.0 - visualization), scIntegrationMetrics (v 1.1 - compute LISI integration metrics).

## Import packages
library("dplyr") # data wrangling
library("Seurat") # scRNA-seq analysis
library("patchwork") # viz
library("SeuratWrappers") # integration wrappers
library("scIntegrationMetrics") # compute LISI integration metrics

Create output directories to save intermediate results, figures, tables and R objects.

## Output directories
res.dir <- file.path("../results", "ifnb_stimulated_task", c("plots", "tables", "objects"))
for (folder in res.dir) if (!dir.exists(folder)) dir.create(path = folder, recursive = TRUE)

(1) Import datasets

(5 min)

AIM: Import and explore the Seurat object data.

Import the ifnb stimulated Seurat R object ifnb.rds located in the folder data.

# Import data
data.dir <- "../data"
seu <- readRDS(file = file.path(data.dir, "ifnb.rds"))

Downsample dataset

You may want to down sample this data set depending on the amount of RAM memory you have. The ifnb data set has 13,999 cells.

## Downsample data set
downsample <- TRUE # replace to FALSE in case you don't want to down sample
prop.down <- 0.2 # proportion of cells to down sample per batch: 20% of the cells
if (downsample) {
  no.cells.batch <- ceiling(table(seu$stim) * 0.2) # CTRL = 1310 and STIM = 1491 
  cell.idx.batch <- split(x = colnames(seu), f = seu$stim) # split into a list the cell names per batch
  cell.idx.batch.down <- lapply(X = setNames(names(cell.idx.batch), names(cell.idx.batch)), FUN = function(x) {
    set.seed(123)
    sample(x = cell.idx.batch[[x]], size = no.cells.batch[[x]], replace = FALSE)
  }) # downsample each batch cell names 
  cell.idx.downsample <- do.call(c, cell.idx.batch.down) # join cell name labels from the two batches into one character vector
  seu <- subset(seu, cells = cell.idx.downsample)
}
gc()

##            used  (Mb) gc trigger (Mb) max used  (Mb)
## Ncells  3442951 183.9    6551764  350  6366457 340.1
## Vcells 11924832  91.0   44423802  339 45869572 350.0

cat("No. of cells downsampled for `CTRL` was:", table(seu$stim)[1], "\n")

## No. of cells downsampled for `CTRL` was: 1310

cat("No. of cells downsampled for `STIM` was:", table(seu$stim)[2], "\n")

## No. of cells downsampled for `STIM` was: 1491

Explore quickly the Seurat seu object.

## Explore Seurat object
# Print Seurat object
seu

## An object of class Seurat 
## 14053 features across 2801 samples within 1 assay 
## Active assay: RNA (14053 features, 0 variable features)
##  2 layers present: counts, data

# Structure
str(seu)

## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
##   ..@ assays      :List of 1
##   .. ..$ RNA:Formal class 'Assay5' [package "SeuratObject"] with 8 slots
##   .. .. .. ..@ layers    :List of 2
##   .. .. .. .. ..$ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. .. ..@ i       : int [1:1969604] 20 27 37 64 65 83 87 131 139 175 ...
##   .. .. .. .. .. .. ..@ p       : int [1:2802] 0 877 1439 2190 2961 3541 4124 4743 5301 5833 ...
##   .. .. .. .. .. .. ..@ Dim     : int [1:2] 14053 2801
##   .. .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. ..@ x       : num [1:1969604] 1 1 1 1 1 2 1 1 1 1 ...
##   .. .. .. .. .. .. ..@ factors : list()
##   .. .. .. .. ..$ data  :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. .. ..@ i       : int [1:1969604] 20 27 37 64 65 83 87 131 139 175 ...
##   .. .. .. .. .. .. ..@ p       : int [1:2802] 0 877 1439 2190 2961 3541 4124 4743 5301 5833 ...
##   .. .. .. .. .. .. ..@ Dim     : int [1:2] 14053 2801
##   .. .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. ..@ x       : num [1:1969604] 1 1 1 1 1 2 1 1 1 1 ...
##   .. .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ cells     :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
##   .. .. .. .. .. ..@ .Data: logi [1:2801, 1:2] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. .. .. ..$ : chr [1:2801] "AAACATACATTTCC.1" "AAACGCTGTTCCAT.1" "AAACGGCTGGTCAT.1" "AAACTTGACTGGAT.1" ...
##   .. .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. .. .. ..$ dim     : int [1:2] 2801 2
##   .. .. .. .. .. ..$ dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:2801] "AAACATACATTTCC.1" "AAACGCTGTTCCAT.1" "AAACGGCTGGTCAT.1" "AAACTTGACTGGAT.1" ...
##   .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. ..@ features  :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
##   .. .. .. .. .. ..@ .Data: logi [1:14053, 1:2] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. .. .. ..$ : chr [1:14053] "AL627309.1" "RP11-206L10.2" "LINC00115" "NOC2L" ...
##   .. .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. .. .. ..$ dim     : int [1:2] 14053 2
##   .. .. .. .. .. ..$ dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:14053] "AL627309.1" "RP11-206L10.2" "LINC00115" "NOC2L" ...
##   .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. ..@ default   : int 1
##   .. .. .. ..@ assay.orig: chr(0) 
##   .. .. .. ..@ meta.data :'data.frame':  14053 obs. of  0 variables
##   .. .. .. ..@ misc      : list()
##   .. .. .. ..@ key       : chr "rna_"
##   ..@ meta.data   :'data.frame': 2801 obs. of  5 variables:
##   .. ..$ orig.ident        : chr [1:2801] "IMMUNE_CTRL" "IMMUNE_CTRL" "IMMUNE_CTRL" "IMMUNE_CTRL" ...
##   .. ..$ nCount_RNA        : num [1:2801] 3017 1269 1484 2185 1520 ...
##   .. ..$ nFeature_RNA      : int [1:2801] 877 562 751 771 580 583 619 558 532 666 ...
##   .. ..$ stim              : chr [1:2801] "CTRL" "CTRL" "CTRL" "CTRL" ...
##   .. ..$ seurat_annotations: Factor w/ 13 levels "CD14 Mono","CD4 Naive T",..: 1 3 7 4 3 1 6 1 2 4 ...
##   ..@ active.assay: chr "RNA"
##   ..@ active.ident: Factor w/ 2 levels "IMMUNE_CTRL",..: 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "names")= chr [1:2801] "AAACATACATTTCC.1" "AAACGCTGTTCCAT.1" "AAACGGCTGGTCAT.1" "AAACTTGACTGGAT.1" ...
##   ..@ graphs      : list()
##   ..@ neighbors   : list()
##   ..@ reductions  : list()
##   ..@ images      : list()
##   ..@ project.name: chr "ifnb"
##   ..@ misc        : list()
##   ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
##   .. ..$ : int [1:3] 5 0 2
##   ..@ commands    : list()
##   ..@ tools       : list()

# Check meta.data
head(seu@meta.data)

##                   orig.ident nCount_RNA nFeature_RNA stim seurat_annotations
## AAACATACATTTCC.1 IMMUNE_CTRL       3017          877 CTRL          CD14 Mono
## AAACGCTGTTCCAT.1 IMMUNE_CTRL       1269          562 CTRL       CD4 Memory T
## AAACGGCTGGTCAT.1 IMMUNE_CTRL       1484          751 CTRL        T activated
## AAACTTGACTGGAT.1 IMMUNE_CTRL       2185          771 CTRL          CD16 Mono
## AAACTTGAGGAAAT.1 IMMUNE_CTRL       1520          580 CTRL       CD4 Memory T
## AAAGACGAAACAGA.1 IMMUNE_CTRL       1667          583 CTRL          CD14 Mono

# Check how many cells per data set 
table(seu$stim)

## 
## CTRL STIM 
## 1310 1491

# Check no. of genes 
nrow(seu)

## [1] 14053

# Check no. of cells 
ncol(seu)

## [1] 2801

(2) Assess batch effect

Joint dimred

(7 min)

AIM: See how much the two data sets overlap each other in the low dimensional reductions.

Run the standard Seurat upstream workflow to jointly compute a PCA and UMAP for the datasets:

NormalizeData(): log1p-normalization with a scaling factor of 10K
FindVariableFeatures(): identification of 2K HVG
ScaleData(): standardization of the 2K HVG
RunPCA(): computation of a PCA with the 2K HVG standardized
RunUMAP(): computation of a UMAP using the first dims of the previously computed PCA

## Joint analysis

# Standard Seurat upstream workflow
seu <- NormalizeData(seu)
seu <- FindVariableFeatures(seu)
seu <- ScaleData(seu)
seu <- RunPCA(seu)
seu <- RunUMAP(seu, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")

Plot the PCA and UMAP side-by-side below.

## Plot jointly dimreds
pca.unint <- DimPlot(seu, reduction = "pca", group.by = "stim")
umap.unint <- DimPlot(seu, reduction = "umap.unintegrated", group.by = "stim")
pca.unint + umap.unint

Celltype markers

(5 min)

AIM: Check if cells from different datasets share well-known cell-specific markers.

Plot below some cell-specific PBMC cell type markers. Feel free to add other genes you might be interested in checking.

## Joint celltype markers

# List of PBMC cell markers
markers.plot <- list(
  # "pbmc" = c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
  #            "CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", 
  #            "HLA-DQA1", "GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ", 
  #            "PRSS57"), 
  "pbmc" = c("CD3D", "NKG7", "CD8A", "MS4A1", "CD79A", "FCGR3A")
)

# Plot
pbmc.markers.unint.plot <- FeaturePlot(seu, features = markers.plot$pbmc, split.by = "stim", 
                                       max.cutoff = 3, cols = c("grey", "red"), 
                                       reduction = "umap.unintegrated", ncol = 4, 
                                       pt.size = 0.1)

## Plot jointly celltype markers

# Print 
pbmc.markers.unint.plot

Manual cell annotation

(15 min)

AIM: Check the number of differentially expressed genes for dataset-specific clusters shared between datasets.

Split the Seurat object into a list of two Seurat objects (one per dataset) and run the standard Seurat workflow for each. After calculating the PCA, run FindNeighbors() and FindClusters() sequentially to perform graph-based clustering for each dataset, in order to determine the dataset-specific cluster markers.

## Independent sample analysis

# Split Seurat object into two batch on 'stim' label identity
seu.list <- SplitObject(object = seu, split.by = "stim")

# Standard Seurat upstream workflow
seu.list <- lapply(X = seu.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x)
  x <- ScaleData(x)
  x <- RunPCA(x)
  x <- FindNeighbors(x, dims = 1:15, reduction = "pca")
  x <- FindClusters(x, resolution = 0.8, cluster.name = "unintegrated_clusters")
  x <- RunUMAP(x, dims = 1:15, reduction = "pca", reduction.name = "umap.unintegrated")
})

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 1310
## Number of edges: 48026
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8103
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 1491
## Number of edges: 57060
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8153
## Number of communities: 8
## Elapsed time: 0 seconds

Plot UMAPs for both datasets highlighting the Seurat clusters found for each.

## Plot independent sample analysis clusters
umap.ind.samp.unint <- lapply(X = seu.list, FUN = function(x) {
  DimPlot(x, reduction = "umap.unintegrated", group.by = "unintegrated_clusters", pt.size = 0.1, label = TRUE)
})
umap.ind.samp.unint$CTRL + umap.ind.samp.unint$STIM

Compute the differentially expressed genes for every cluster in every dataset and retrieve only the upregulated genes for every cluster. Then pick the top 50 upregulated genes per cluster based on log2 fold-change, among those you were statistically significant, i.e., FDR<0.05, and calculate the intersection of cluster genes between datasets.

## Independent sample analysis: DGE

# Differential gene expression analysis per cluster 
dge.markers.unint <- lapply(X = seu.list, FUN = function(x) {
  FindAllMarkers(object = x, assay = "RNA", slot = "data", 
                 logfc.threshold = 0.25, min.pct = 0.25, 
                 min.cells.feature = 10, only.pos = TRUE)
})

# Pick the top50 upregulated genes per cluster based on log2FC
top50.up.cluster <- lapply(X = dge.markers.unint, FUN = function(x) {
  x %>% 
    filter(p_val_adj<0.05) %>% 
    group_by(cluster) %>% 
    arrange(desc(avg_log2FC)) %>% 
    slice_head(n=50)  %>% 
    split(., .$cluster)
}) 

# Check intersection of top50 marker genes between clusters across batches
shared.genes <- list()
for (i in names(top50.up.cluster$CTRL)) {
  for (ii in names(top50.up.cluster$STIM)) {
    shared.genes[[paste0("CTRL", i)]][[paste0("STIM", ii)]] <- intersect(top50.up.cluster$CTRL[[i]]$gene, 
                                                                             top50.up.cluster$STIM[[ii]]$gene)
  }
}
# Table with number of genes shared between CTRL vs STIM clusters for the top50 upregulated genes per cluster
shared.genes.table <- as.data.frame(
  lapply(X = shared.genes, FUN = function(x) {
    unlist(lapply(x, length))
  })
)

Print the confusion matrix of cluster markers shared between datasets.

## Print table
knitr::kable(shared.genes.table)

	CTRL0	CTRL1	CTRL2	CTRL3	CTRL4	CTRL5	CTRL6	CTRL7	CTRL8	CTRL9
STIM0	22	0	0	0	0	0	0	3	0	10
STIM1	0	27	7	2	0	6	1	0	1	0
STIM2	0	14	30	6	0	5	3	0	0	0
STIM3	0	1	0	0	0	21	2	7	7	0
STIM4	0	3	8	30	0	2	1	2	0	0
STIM5	0	0	0	0	25	1	0	1	1	0
STIM6	0	3	2	0	0	0	21	1	0	0
STIM7	1	0	0	1	1	10	0	22	4	0

Plot the previous table as a heatmap.

## Plot independent sample analysis clusters
ComplexHeatmap::Heatmap(matrix = as.matrix(shared.genes.table), name = "Shared gene no.", 
                        cluster_columns = FALSE, cluster_rows = FALSE)

Automatic cell annotation

(10 min)

AIM: Check if datasets share cell types by predicting cell type labels for both datasets.

This exercise requires to run CellTypist. CellTypist is a python package that can be run using python, the command-line or online through their website. For convenience, run CellTypist online.

First export the Seurat R object as anndata h5ad python-compatible object with the function zellkonverter::writeH5AD() by running the R code chunk below. This will create a file named ifnb_celltypist.h5ad under the directory: results/cross_tissue_task/objects. Next, go to the CellTypist website: https://www.celltypist.org/. Put your own e-mail address. Select the model Immune_All_Low.pkl which comprises a model for annotation of immune cells. Allow majority voting. Finally, upload the file ifnb_celltypist.h5ad.

## Automatic cell annotation
file.name <- file.path(res.dir[3], "ifnb_celltypist.h5ad")
cat("Exporting Seurat object as '.h5ad' format to:", gsub("\\../", "", file.name), "\n")

## Exporting Seurat object as '.h5ad' format to: results/ifnb_stimulated_task/objects/ifnb_celltypist.h5ad

zellkonverter::writeH5AD(sce = as.SingleCellExperiment(seu), file = file.name, X_name = "logcounts")

You should receive an e-mail with a download link with the result. Download the result - predictions.tar.gz - and put the result into the directory: results/cross_tissue_task/tables. In alternative, you can substitute the url below (because it’s only valid for 7 days) by copying and pasting the link you received in your own e-mail and replace the variable FALSE by TRUE for the variable use.url.

Plot the predicted labels for both data sets.

## Plot labels from CellTypist

# Download predictions
use.url <- FALSE # if you wanna use the url, replace the url by the url you received in your e-mail and replace FALSE by TRUE
if (use.url) { # download: url only valid for 7 days
  url <- "https://celltypist.cog.sanger.ac.uk/uploads/17baab62-d10e-4de6-a4ed-b71073518329/predictions.tar.gz?AWSAccessKeyId=C068AUIY7F6SNEJUTEPA&Signature=Ss%2FNyQOcte7QS06j8tnl2s2YDVQ%3D&Expires=1720194680"
  download.file(url = url, destfile = file.path(res.dir[2], "predictions.tar.gz"))
}
# Decompress the file with predictions 
untar(tarfile = file.path(res.dir[2], "predictions.tar.gz"), exdir = res.dir[2])

# Add predictions to Seurat object
seu@meta.data[,c("predicted_labels", "over_clustering", "majority_voting")] <- read.table(file = file.path(res.dir[2],
                                                                                                           "predicted_labels.csv"),
                                                                                          header = TRUE, sep = ",", row.names = 1)

# Plot predictions
DimPlot(object = seu, reduction = "umap.unintegrated", group.by = "majority_voting", 
        split.by = "stim", pt.size = 0.1, label = TRUE)

(3) Integrate datasets

(10 min)

AIM: Compare different integration methods.

First, split the layers of data by batch before performing integration. Then, apply the standard Seurat workflow. Finally, call the function IntegrateLayers() to integrate the datasets. In this function you can specify the method you want to run by providing the integration method function.

Seurat provides three methods: CCA (CCAIntegration), RPCA (RPCAIntegration) and Harmony (HarmonyIntegration). In addition, other methods can be called by using functions from SeuratWrappers such as: FastMNN (FastMNNIntegration) or scVI (scVIIntegration) among others. Harmony (from the harmony R package), FastMNN (from the batchelor R package) and scVI (python package installed with conda) need to be installed independently from Seurat.

Run the R chunk code below to run the integration methods: CCA, RPCA, Harmony and FastMNN (you can try to run scVI if you’ve it installed in your system). Join the layers back after integration to project the integrated data onto UMAP. The UMAP highlights the stim and ground-truth seurat_annotations labels.

## Perform integration

# Split layers for integration
seu[["RNA"]] <- split(x = seu[["RNA"]], f = seu$stim)

# Standard workflow
seu <- NormalizeData(seu)
seu <- FindVariableFeatures(seu)
seu <- ScaleData(seu)
seu <- RunPCA(seu)

# Integrate layers
int.methods <- c("CCA" = "CCAIntegration", "RPCA" = "RPCAIntegration", 
                 "Harmony" = "HarmonyIntegration", "FastMNN" = "FastMNNIntegration", 
                 "scVI" = "scVIIntegration")

for (m in names(int.methods)[1:4]) {
  cat("\nRunning integration method", m, "...\n")
  int.dimred <- paste0("integrated.", m)
  umap.dimred <- paste0("umap.", m)
  # Integration
  if (m=="scVI") {
    seu <- IntegrateLayers(object = seu, method = get(eval(substitute(int.methods[m]))), 
                         orig.reduction = "pca", 
                         new.reduction = int.dimred,
                         conda_env = "~/miniconda3/envs/scvi-env", # substitute this by your installation 
                         verbose = TRUE)
  } else {
    seu <- IntegrateLayers(object = seu, method = get(eval(substitute(int.methods[m]))), 
                         orig.reduction = "pca", 
                         new.reduction = int.dimred,
                         verbose = TRUE)
  }

}

## 
## Running integration method CCA ...
## 
## Running integration method RPCA ...
## 
## Running integration method Harmony ...
## 
## Running integration method FastMNN ...

# Re-join layers after integration
seu[["RNA"]] <- JoinLayers(seu[["RNA"]])

# Run UMAP for every integration method
int.umaps.plots <- list()
for (m in names(int.methods)[1:4]) {
  cat("\nRunning UMAP for", m, "integrated result...\n")
  int.dimred <- paste0("integrated.", m)
  umap.dimred <- paste0("umap.", m)
  seu <- RunUMAP(seu, dims = 1:30, reduction = int.dimred, reduction.name = umap.dimred)
  int.umaps.plots[[m]] <-  DimPlot(object = seu, reduction = umap.dimred, group.by = c("stim", "seurat_annotations"), 
                                   combine = FALSE, label.size = 2)
}

## 
## Running UMAP for CCA integrated result...
## 
## Running UMAP for RPCA integrated result...
## 
## Running UMAP for Harmony integrated result...
## 
## Running UMAP for FastMNN integrated result...

# Save Seurat object
saveRDS(object = seu, file = file.path(res.dir[3], "seu_integrated.rds"))

(4) Assess integration

(15 min)

AIM: Assess integration qualitatively and quantitatively through dimensional reduction visualizations and LISI scores.

Qualitative viz

Plot the integrated embeddings below highlighting the stim and ground-truth seurat_annotations labels.

## Assess integration by printing the plots using the "batch" and "cell_type" (ground-truth) labels
wrap_plots(c(int.umaps.plots$CCA, int.umaps.plots$RPCA, int.umaps.plots$Harmony, int.umaps.plots$FastMNN),
           ncol = 2, byrow = TRUE)

Quantitative metrics

Run the code below to compute the i/cLISI scores for every integrated embedding with the function getIntegrationMetrics() from the package scIntegrationMetrics (read more about the meaning of these metrics here).

## Assess quantitatively integration with scIntegrationMetrics

# Calculate metrics
int.mthds.names <- paste0("integrated.", names(int.methods)[1:4])
names(int.mthds.names) <- int.mthds.names
metrics <- list()
for (m in int.mthds.names) {
  key <- gsub("integrated.", "", m)
  cat("Computing i/cLISI metrics for integration method:", gsub("integrated.", "", key), "\n")
  metrics[[key]] <- getIntegrationMetrics(seu, meta.label = "seurat_annotations", meta.batch = "stim",
                                          method.reduction = m, metrics = c("iLISI", "norm_iLISI", 
                                                                            #"CiLISI", "CiLISI_means", 
                                                                            "norm_cLISI", "norm_cLISI_means"))
}

## Computing i/cLISI metrics for integration method: CCA 
## Computing i/cLISI metrics for integration method: RPCA 
## Computing i/cLISI metrics for integration method: Harmony 
## Computing i/cLISI metrics for integration method: FastMNN

# Join metrics
metrics <- as.data.frame(do.call(cbind, metrics))

Print the result below.

# Print table
knitr::kable(metrics)

	CCA	RPCA	Harmony	FastMNN
iLISI	1.761276	1.805624	1.589445	1.329591
norm_iLISI	0.7612756	0.8056235	0.5894454	0.329591
norm_cLISI	0.9402503	0.9362085	0.9329568	0.9393513
norm_cLISI_means	0.8956541	0.8914974	0.8859415	0.907111

R packages used and respective versions

## R packages and versions used in these analyses
sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scIntegrationMetrics_1.1 SeuratWrappers_0.3.2     patchwork_1.2.0         
## [4] Seurat_5.1.0             SeuratObject_5.0.2       sp_2.1-4                
## [7] dplyr_1.1.4             
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.4                  spatstat.explore_3.2-7     
##   [3] reticulate_1.38.0           R.utils_2.12.3             
##   [5] tidyselect_1.2.1            htmlwidgets_1.6.4          
##   [7] BiocParallel_1.28.3         grid_4.1.0                 
##   [9] Rtsne_0.17                  ScaledMatrix_1.2.0         
##  [11] zellkonverter_1.4.0         munsell_0.5.1              
##  [13] codetools_0.2-18            ica_1.0-3                  
##  [15] future_1.33.2               miniUI_0.1.1.1             
##  [17] batchelor_1.10.0            withr_3.0.0                
##  [19] spatstat.random_3.2-3       colorspace_2.1-0           
##  [21] progressr_0.14.0            filelock_1.0.3             
##  [23] Biobase_2.54.0              highr_0.11                 
##  [25] knitr_1.47                  rstudioapi_0.13            
##  [27] stats4_4.1.0                SingleCellExperiment_1.16.0
##  [29] ROCR_1.0-11                 tensor_1.5                 
##  [31] listenv_0.9.1               MatrixGenerics_1.6.0       
##  [33] labeling_0.4.3              harmony_1.2.0              
##  [35] GenomeInfoDbData_1.2.7      polyclip_1.10-6            
##  [37] farver_2.1.2                basilisk_1.6.0             
##  [39] parallelly_1.37.1           vctrs_0.6.5                
##  [41] generics_0.1.3              xfun_0.45                  
##  [43] R6_2.5.1                    doParallel_1.0.17          
##  [45] GenomeInfoDb_1.30.1         clue_0.3-65                
##  [47] rsvd_1.0.5                  DelayedArray_0.20.0        
##  [49] bitops_1.0-7                spatstat.utils_3.0-5       
##  [51] cachem_1.1.0                assertthat_0.2.1           
##  [53] promises_1.3.0              scales_1.3.0               
##  [55] gtable_0.3.5                beachmat_2.10.0            
##  [57] globals_0.16.3              goftest_1.2-3              
##  [59] klippy_0.0.0.9500           spam_2.10-0                
##  [61] rlang_1.1.4                 GlobalOptions_0.1.2        
##  [63] splines_4.1.0               lazyeval_0.2.2             
##  [65] spatstat.geom_3.2-9         BiocManager_1.30.23        
##  [67] yaml_2.3.8                  reshape2_1.4.4             
##  [69] abind_1.4-5                 httpuv_1.6.15              
##  [71] tools_4.1.0                 ggplot2_3.5.1              
##  [73] jquerylib_0.1.4             RColorBrewer_1.1-3         
##  [75] BiocGenerics_0.40.0         ggridges_0.5.6             
##  [77] Rcpp_1.0.12                 plyr_1.8.9                 
##  [79] sparseMatrixStats_1.6.0     zlibbioc_1.40.0            
##  [81] purrr_1.0.2                 RCurl_1.98-1.14            
##  [83] basilisk.utils_1.6.0        deldir_2.0-4               
##  [85] pbapply_1.7-2               GetoptLong_1.0.5           
##  [87] cowplot_1.1.3               S4Vectors_0.32.4           
##  [89] zoo_1.8-12                  SummarizedExperiment_1.24.0
##  [91] ggrepel_0.9.5               cluster_2.1.2              
##  [93] magrittr_2.0.3              data.table_1.15.4          
##  [95] RSpectra_0.16-1             scattermore_1.2            
##  [97] ResidualMatrix_1.4.0        circlize_0.4.16            
##  [99] lmtest_0.9-40               RANN_2.6.1                 
## [101] fitdistrplus_1.1-11         matrixStats_1.1.0          
## [103] mime_0.12                   evaluate_0.24.0            
## [105] xtable_1.8-4                RhpcBLASctl_0.23-42        
## [107] fastDummies_1.7.3           IRanges_2.28.0             
## [109] gridExtra_2.3               shape_1.4.6.1              
## [111] compiler_4.1.0              tibble_3.2.1               
## [113] KernSmooth_2.23-20          crayon_1.5.3               
## [115] R.oo_1.26.0                 htmltools_0.5.8.1          
## [117] mgcv_1.8-35                 later_1.3.2                
## [119] tidyr_1.3.1                 DBI_1.2.3                  
## [121] ComplexHeatmap_2.15.4       MASS_7.3-54                
## [123] Matrix_1.6-5                permute_0.9-7              
## [125] cli_3.6.3                   R.methodsS3_1.8.2          
## [127] parallel_4.1.0              dotCall64_1.1-1            
## [129] igraph_2.0.3                GenomicRanges_1.46.1       
## [131] pkgconfig_2.0.3             dir.expiry_1.2.0           
## [133] scuttle_1.4.0               plotly_4.10.4              
## [135] spatstat.sparse_3.1-0       foreach_1.5.2              
## [137] bslib_0.7.0                 XVector_0.34.0             
## [139] stringr_1.5.1               digest_0.6.36              
## [141] sctransform_0.4.1           RcppAnnoy_0.0.22           
## [143] vegan_2.6-6.1               spatstat.data_3.1-2        
## [145] rmarkdown_2.27              leiden_0.4.3.1             
## [147] uwot_0.2.2                  DelayedMatrixStats_1.16.0  
## [149] shiny_1.8.1.1               rjson_0.2.21               
## [151] lifecycle_1.0.4             nlme_3.1-152               
## [153] jsonlite_1.8.8              BiocNeighbors_1.12.0       
## [155] viridisLite_0.4.2           limma_3.50.3               
## [157] fansi_1.0.6                 pillar_1.9.0               
## [159] lattice_0.20-44             fastmap_1.2.0              
## [161] httr_1.4.7                  survival_3.2-11            
## [163] glue_1.7.0                  remotes_2.5.0              
## [165] png_0.1-8                   iterators_1.0.14           
## [167] presto_1.0.0                stringi_1.8.4              
## [169] sass_0.4.9                  RcppHNSW_0.6.0             
## [171] BiocSingular_1.10.0         irlba_2.3.5.1              
## [173] future.apply_1.11.2

Ifnb stimulated integration task

António Sousa (e-mail: aggode@utu.fi) - Elo lab (https://elolab.utu.fi)

03/07/2024