Integration task

This single-cell RNA-seq integration task describes an example of cross-tissue integration: human peripheral blood mononuclear cells (PBMCs) with human pancreatic islets. Both data sets were retrieved from the SeuratData (v.0.2.2.9001):

PBMCs:
- pbmc3k SeuratData data set (v.3.1.4): 3k human PBMCs from 10X Genomics
- no. of cells: 2,700
Pancreas:
- panc8 SeuratData data set (v.3.0.2): 8 human pancreas data sets across five technologies (only included the data set indrop1)
- no. of cells (for indrop1): 1,937

The identity of the data set - pbmc or pancreas - for every cell was saved in the Seurat meta.data column variable batch. The ground-truth cell identities were also provided in the column variable cell_type but avoid checking them until the end of this notebook to make these analyses more interesting.

The analyses performed in this notebook rely in the Seurat R package (v.5.1.0).

Import the main packages used in this notebook: Seurat (v.5.1.0), SeuratWrappers (v.0.3.2 - integration wrappers for Seurat), dplyr (v.1.1.4 - wrangling data), patchwork (v.1.2.0 - visualization), scIntegrationMetrics (v 1.1 - compute LISI integration metrics).

## Import packages
library("dplyr") # data wrangling
library("Seurat") # scRNA-seq analysis
library("patchwork") # viz
library("SeuratWrappers") # integration wrappers
library("scIntegrationMetrics") # compute LISI integration metrics

Create output directories to save intermediate results, figures, tables and R objects.

## Output directories
res.dir <- file.path("../results", "cross_tissue_task", c("plots", "tables", "objects"))
for (folder in res.dir) if (!dir.exists(folder)) dir.create(path = folder, recursive = TRUE)

(1) Import datasets

(3 min)

AIM: Import and explore the Seurat object data.

Import the cross-tissue Seurat R object pbmc3k_panc8.rds located in the folder data.

# Import data
data.dir <- "../data"
seu <- readRDS(file = file.path(data.dir, "pbmc3k_panc8.rds"))

Explore quickly the Seurat seu object.

## Explore Seurat object
# Print Seurat object
seu

## An object of class Seurat 
## 35686 features across 4637 samples within 1 assay 
## Active assay: RNA (35686 features, 0 variable features)
##  2 layers present: data, counts

# Structure
str(seu)

## Formal class 'Seurat' [package "SeuratObject"] with 13 slots
##   ..@ assays      :List of 1
##   .. ..$ RNA:Formal class 'Assay5' [package "SeuratObject"] with 8 slots
##   .. .. .. ..@ layers    :List of 2
##   .. .. .. .. ..$ data  :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. .. ..@ i       : int [1:6009421] 29 73 80 148 163 184 186 227 229 230 ...
##   .. .. .. .. .. .. ..@ p       : int [1:4638] 0 779 2131 3260 4220 4741 5522 6304 7094 7626 ...
##   .. .. .. .. .. .. ..@ Dim     : int [1:2] 35686 4637
##   .. .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. ..@ x       : num [1:6009421] 1 1 2 1 1 1 1 41 1 1 ...
##   .. .. .. .. .. .. ..@ factors : list()
##   .. .. .. .. ..$ counts:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
##   .. .. .. .. .. .. ..@ i       : int [1:6009421] 29 73 80 148 163 184 186 227 229 230 ...
##   .. .. .. .. .. .. ..@ p       : int [1:4638] 0 779 2131 3260 4220 4741 5522 6304 7094 7626 ...
##   .. .. .. .. .. .. ..@ Dim     : int [1:2] 35686 4637
##   .. .. .. .. .. .. ..@ Dimnames:List of 2
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. .. ..$ : NULL
##   .. .. .. .. .. .. ..@ x       : num [1:6009421] 1 1 2 1 1 1 1 41 1 1 ...
##   .. .. .. .. .. .. ..@ factors : list()
##   .. .. .. ..@ cells     :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
##   .. .. .. .. .. ..@ .Data: logi [1:4637, 1:2] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. .. .. ..$ : chr [1:4637] "pbmc_AAACATACAACCAC" "pbmc_AAACATTGAGCTAC" "pbmc_AAACATTGATCAGC" "pbmc_AAACCGTGCTTCCG" ...
##   .. .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. .. .. ..$ dim     : int [1:2] 4637 2
##   .. .. .. .. .. ..$ dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:4637] "pbmc_AAACATACAACCAC" "pbmc_AAACATTGAGCTAC" "pbmc_AAACATTGATCAGC" "pbmc_AAACCGTGCTTCCG" ...
##   .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. ..@ features  :Formal class 'LogMap' [package "SeuratObject"] with 1 slot
##   .. .. .. .. .. ..@ .Data: logi [1:35686, 1:2] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   .. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
##   .. .. .. .. .. .. .. ..$ : chr [1:35686] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" ...
##   .. .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. .. .. ..$ dim     : int [1:2] 35686 2
##   .. .. .. .. .. ..$ dimnames:List of 2
##   .. .. .. .. .. .. ..$ : chr [1:35686] "AL627309.1" "AP006222.2" "RP11-206L10.2" "RP11-206L10.9" ...
##   .. .. .. .. .. .. ..$ : chr [1:2] "counts" "data"
##   .. .. .. ..@ default   : int 1
##   .. .. .. ..@ assay.orig: chr(0) 
##   .. .. .. ..@ meta.data :'data.frame':  35686 obs. of  0 variables
##   .. .. .. ..@ misc      : list()
##   .. .. .. ..@ key       : chr "rna_"
##   ..@ meta.data   :'data.frame': 4637 obs. of  5 variables:
##   .. ..$ orig.ident  : chr [1:4637] "pbmc3k" "pbmc3k" "pbmc3k" "pbmc3k" ...
##   .. ..$ nCount_RNA  : num [1:4637] 2419 4903 3147 2639 980 ...
##   .. ..$ nFeature_RNA: int [1:4637] 779 1352 1129 960 521 781 782 790 532 550 ...
##   .. ..$ cell_type   : chr [1:4637] "Memory CD4 T" "B" "Memory CD4 T" "CD14+ Mono" ...
##   .. ..$ batch       : chr [1:4637] "pbmc" "pbmc" "pbmc" "pbmc" ...
##   ..@ active.assay: chr "RNA"
##   ..@ active.ident: Factor w/ 2 levels "pbmc3k","indrop": 1 1 1 1 1 1 1 1 1 1 ...
##   .. ..- attr(*, "names")= chr [1:4637] "pbmc_AAACATACAACCAC" "pbmc_AAACATTGAGCTAC" "pbmc_AAACATTGATCAGC" "pbmc_AAACCGTGCTTCCG" ...
##   ..@ graphs      : list()
##   ..@ neighbors   : list()
##   ..@ reductions  : list()
##   ..@ images      : list()
##   ..@ project.name: chr "pbmc3k_panc8"
##   ..@ misc        : list()
##   ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
##   .. ..$ : int [1:3] 5 0 2
##   ..@ commands    : list()
##   ..@ tools       : list()

# Check meta.data
head(seu@meta.data)

##                     orig.ident nCount_RNA nFeature_RNA    cell_type batch
## pbmc_AAACATACAACCAC     pbmc3k       2419          779 Memory CD4 T  pbmc
## pbmc_AAACATTGAGCTAC     pbmc3k       4903         1352            B  pbmc
## pbmc_AAACATTGATCAGC     pbmc3k       3147         1129 Memory CD4 T  pbmc
## pbmc_AAACCGTGCTTCCG     pbmc3k       2639          960   CD14+ Mono  pbmc
## pbmc_AAACCGTGTATGCG     pbmc3k        980          521           NK  pbmc
## pbmc_AAACGCACTGGTAC     pbmc3k       2163          781 Memory CD4 T  pbmc

# Check how many cells per data set 
table(seu$batch)

## 
## pancreas     pbmc 
##     1937     2700

# Check no. of genes 
nrow(seu)

## [1] 35686

# Check no. of cells 
ncol(seu)

## [1] 4637

(2) Assess batch effect

Joint dimred

(7 min)

AIM: See how much the two data sets overlap each other in the low dimensional reductions.

Run the standard Seurat upstream workflow to jointly compute a PCA and UMAP for the datasets:

NormalizeData(): log1p-normalization with a scaling factor of 10K
FindVariableFeatures(): identification of 2K HVG
ScaleData(): standardization of the 2K HVG
RunPCA(): computation of a PCA with the 2K HVG standardized
RunUMAP(): computation of a UMAP using the first dims of the previously computed PCA

## Joint analysis

# Standard Seurat upstream workflow
seu <- NormalizeData(seu)
seu <- FindVariableFeatures(seu)
seu <- ScaleData(seu)
seu <- RunPCA(seu)
seu <- RunUMAP(seu, dims = 1:30, reduction = "pca", reduction.name = "umap.unintegrated")

Plot the PCA and UMAP side-by-side below.

## Plot jointly dimreds
pca.unint <- DimPlot(seu, reduction = "pca", group.by = "batch")
umap.unint <- DimPlot(seu, reduction = "umap.unintegrated", group.by = "batch")
pca.unint + umap.unint

Celltype markers

(5 min)

AIM: Check if cells from different datasets share well-known cell-specific markers.

Plot below some cell-specific PBMC or pancreatic cell type markers. Feel free to add other genes you might be interested in checking.

## Joint celltype markers

# List of PBMC and (some) pancreatic cell markers
markers.plot <- list(
  # "pbmc" = c("CD3D", "CREM", "HSPH1", "SELL", "GIMAP5", "CACYBP", "GNLY", "NKG7", "CCL5",
  #            "CD8A", "MS4A1", "CD79A", "MIR155HG", "NME1", "FCGR3A", "VMO1", "CCL2", "S100A9", 
  #            "HLA-DQA1", "GPR183", "PPBP", "GNG11", "HBA2", "HBB", "TSPAN13", "IL3RA", "IGJ", 
  #            "PRSS57"), 
  "pbmc" = c("CD3D", "NKG7", "CD8A", "MS4A1", "CD79A", "FCGR3A"), 
  "pancreas" = c("REG1A", "PPY", "SST", "GHRL", "VWF", "SOX10")
)

# Plot
pbmc.markers.unint.plot <- FeaturePlot(seu, features = markers.plot$pbmc, split.by = "batch", 
                                       max.cutoff = 3, cols = c("grey", "red"), 
                                       reduction = "umap.unintegrated", ncol = 4, pt.size = 0.1)
pancreas.markers.unint.plot <- FeaturePlot(seu, features = markers.plot$pancreas, split.by = "batch", 
                                           max.cutoff = 3, cols = c("grey", "red"), 
                                           reduction = "umap.unintegrated", ncol = 4, pt.size = 0.1)

## Plot jointly celltype markers

# Print 
pbmc.markers.unint.plot

pancreas.markers.unint.plot

Manual cell annotation

(15 min)

AIM: Check the number of differentially expressed genes for dataset-specific clusters shared between datasets.

Split the Seurat object into a list of two Seurat objects (one per dataset) and run the standard Seurat workflow for each. After calculating the PCA, run FindNeighbors() and FindClusters() sequentially to perform graph-based clustering for each dataset, in order to determine the dataset-specific cluster markers.

## Independent sample analysis

# Split Seurat object into two batch on 'batch' label identity
seu.list <- SplitObject(object = seu, split.by = "batch")

# Standard Seurat upstream workflow
seu.list <- lapply(X = seu.list, FUN = function(x) {
  x <- NormalizeData(x)
  x <- FindVariableFeatures(x)
  x <- ScaleData(x)
  x <- RunPCA(x)
  x <- FindNeighbors(x, dims = 1:15, reduction = "pca")
  x <- FindClusters(x, resolution = 0.8, cluster.name = "unintegrated_clusters")
  x <- RunUMAP(x, dims = 1:15, reduction = "pca", reduction.name = "umap.unintegrated")
})

## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 2700
## Number of edges: 108625
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8180
## Number of communities: 9
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 1937
## Number of edges: 65621
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8659
## Number of communities: 12
## Elapsed time: 0 seconds

Plot UMAPs for both datasets highlighting the Seurat clusters found for each.

## Plot independent sample analysis clusters
umap.ind.samp.unint <- lapply(X = seu.list, FUN = function(x) {
  DimPlot(x, reduction = "umap.unintegrated", group.by = "unintegrated_clusters", pt.size = 0.1, label = TRUE)
})
umap.ind.samp.unint$pbmc + umap.ind.samp.unint$pancreas

Compute the differentially expressed genes for every cluster in every dataset and retrieve only the upregulated genes for every cluster. Then pick the top 50 upregulated genes per cluster based on log2 fold-change, among those you were statistically significant, i.e., FDR<0.05, and calculate the intersection of cluster genes between datasets.

## Independent sample analysis: DGE

# Differential gene expression analysis per cluster 
dge.markers.unint <- lapply(X = seu.list, FUN = function(x) {
  FindAllMarkers(object = x, assay = "RNA", slot = "data", 
                 logfc.threshold = 0.25, min.pct = 0.25, 
                 min.cells.feature = 10, only.pos = TRUE)
})

# Pick the top50 upregulated genes per cluster based on log2FC
top50.up.cluster <- lapply(X = dge.markers.unint, FUN = function(x) {
  x %>% 
    filter(p_val_adj<0.05) %>% 
    group_by(cluster) %>% 
    arrange(desc(avg_log2FC)) %>% 
    slice_head(n=50)  %>% 
    split(., .$cluster)
}) 

# Check intersection of top50 marker genes between clusters across batches
shared.genes <- list()
for (i in names(top50.up.cluster$pbmc)) {
  for (ii in names(top50.up.cluster$pancreas)) {
    shared.genes[[paste0("pbmc", i)]][[paste0("pancreas", ii)]] <- intersect(top50.up.cluster$pbmc[[i]]$gene, 
                                                                             top50.up.cluster$pancreas[[ii]]$gene)
  }
}
# Table with number of genes shared between pbmc vs pancreas clusters for the top50 upregulated genes per cluster
shared.genes.table <- as.data.frame(
  lapply(X = shared.genes, FUN = function(x) {
    unlist(lapply(x, length))
  })
)

Print the confusion matrix of cluster markers shared between datasets.

## Print table
knitr::kable(shared.genes.table)

	pbmc0	pbmc1	pbmc2	pbmc3	pbmc4	pbmc5	pbmc6	pbmc7	pbmc8
pancreas0	0	0	0	0	1	1	2	0	0
pancreas1	0	0	1	0	0	0	0	0	1
pancreas2	0	0	0	1	1	1	2	0	0
pancreas3	0	0	0	0	0	0	1	0	0
pancreas4	0	0	0	0	1	0	0	0	0
pancreas5	1	2	0	0	0	0	0	0	1
pancreas6	0	0	0	0	0	0	0	0	1
pancreas7	0	0	1	0	0	0	0	0	0
pancreas8	0	0	0	0	0	0	0	0	0
pancreas9	0	1	0	0	0	0	1	0	2
pancreas10	0	0	1	0	0	1	0	1	0
pancreas11	0	0	2	2	2	3	7	3	0

Plot the previous table as a heatmap.

## Plot independent sample analysis clusters
ComplexHeatmap::Heatmap(matrix = as.matrix(shared.genes.table), name = "Shared gene no.", 
                        cluster_columns = FALSE, cluster_rows = FALSE)

Automatic cell annotation

(10 min)

AIM: Check if datasets share cell types by predicting cell type labels for both datasets.

This exercise requires to run CellTypist. CellTypist is a python package that can be run using python, the command-line or online through their website. For convenience, run CellTypist online.

First export the Seurat R object as anndata h5ad python-compatible object with the function zellkonverter::writeH5AD() by running the R code chunk below. This will create a file named pbmc3k_panc8_celltypist.h5ad under the directory: results/cross_tissue_task/objects. Next, go to the CellTypist website: https://www.celltypist.org/. Put your own e-mail address. Select the model Immune_All_Low.pkl which comprises a model for annotation of immune cells. Allow majority voting. Finally, upload the file pbmc3k_panc8_celltypist.h5ad.

## Automatic cell annotation
file.name <- file.path(res.dir[3], "pbmc3k_panc8_celltypist.h5ad")
cat("Exporting Seurat object as '.h5ad' format to:", gsub("\\../", "", file.name), "\n")

## Exporting Seurat object as '.h5ad' format to: results/cross_tissue_task/objects/pbmc3k_panc8_celltypist.h5ad

zellkonverter::writeH5AD(sce = as.SingleCellExperiment(seu), file = file.name, X_name = "logcounts")

You should receive an e-mail with a download link with the result. Download the result - predictions.tar.gz - and put the result into the directory: results/cross_tissue_task/tables. In alternative, you can substitute the url below (because it’s only valid for 7 days) by copying and pasting the link you received in your own e-mail and replace the variable FALSE by TRUE for the variable use.url.

Plot the predicted labels for both data sets.

## Plot labels from CellTypist

# Download predictions
use.url <- FALSE # if you wanna use the url, replace the url by the url you received in your e-mail and replace FALSE by TRUE
if (use.url) { # download: url only valid for 7 days
  url <- "https://celltypist.cog.sanger.ac.uk/uploads/9cd807d8-69de-4b7c-a827-70e51c4a8b4a/predictions.tar.gz?AWSAccessKeyId=C068AUIY7F6SNEJUTEPA&Signature=XY4lL4D%2FWnuuBhV5%2BDOCimkyxSk%3D&Expires=1720176188"
  download.file(url = url, destfile = file.path(res.dir[2], "predictions.tar.gz"))
}
# Decompress the file with predictions 
untar(tarfile = file.path(res.dir[2], "predictions.tar.gz"), exdir = res.dir[2])

# Add predictions to Seurat object
seu@meta.data[,c("predicted_labels", "over_clustering", "majority_voting")] <- read.table(file = file.path(res.dir[2],
                                                                                                           "predicted_labels.csv"),
                                                                                          header = TRUE, sep = ",", row.names = 1)

# Plot predictions
DimPlot(object = seu, reduction = "umap.unintegrated", group.by = "majority_voting", 
        split.by = "batch", pt.size = 0.1, label = TRUE)

(3) Integrate datasets

(10 min)

AIM: Compare different integration methods.

First, split the layers of data by batch before performing integration. Then, apply the standard Seurat workflow. Finally, call the function IntegrateLayers() to integrate the datasets. In this function you can specify the method you want to run by providing the integration method function.

Seurat provides three methods: CCA (CCAIntegration), RPCA (RPCAIntegration) and Harmony (HarmonyIntegration). In addition, other methods can be called by using functions from SeuratWrappers such as: FastMNN (FastMNNIntegration) or scVI (scVIIntegration) among others. Harmony (from the harmony R package), FastMNN (from the batchelor R package) and scVI (python package installed with conda) need to be installed independently from Seurat.

Run the R chunk code below to run the integration methods: CCA, RPCA, Harmony and FastMNN (you can try to run scVI if you’ve it installed in your system). Join the layers back after integration to project the integrated data onto UMAP. The UMAP highlights the batch and ground-truth cell_type labels.

## Perform integration

# Split layers for integration
seu[["RNA"]] <- split(x = seu[["RNA"]], f = seu$batch)

# Standard workflow
seu <- NormalizeData(seu)
seu <- FindVariableFeatures(seu)
seu <- ScaleData(seu)
seu <- RunPCA(seu)

# Integrate layers
int.methods <- c("CCA" = "CCAIntegration", "RPCA" = "RPCAIntegration", 
                 "Harmony" = "HarmonyIntegration", "FastMNN" = "FastMNNIntegration", 
                 "scVI" = "scVIIntegration")

for (m in names(int.methods)[1:4]) { 
  cat("\nRunning integration method", m, "...\n")
  int.dimred <- paste0("integrated.", m)
  umap.dimred <- paste0("umap.", m)
  # Integration
  if (m=="RPCA") {
      seu <- IntegrateLayers(object = seu, method = get(eval(substitute(int.methods[m]))), 
                             orig.reduction = "pca", 
                             new.reduction = int.dimred,
                             k.weight = 50, # otherwise it aborts 
                             verbose = TRUE)
  } else if (m=="scVI") {
          seu <- IntegrateLayers(object = seu, method = get(eval(substitute(int.methods[m]))), 
                                 orig.reduction = "pca", 
                                 new.reduction = int.dimred,
                                 conda_env = "~/miniconda3/envs/scvi-env", # substitute this by your installation 
                                 verbose = TRUE)
  } else {
      seu <- IntegrateLayers(object = seu, method = get(eval(substitute(int.methods[m]))), 
                             orig.reduction = "pca", 
                             new.reduction = int.dimred,
                             verbose = TRUE)
  }

}

## 
## Running integration method CCA ...
## 
## Running integration method RPCA ...
## 
## Running integration method Harmony ...
## 
## Running integration method FastMNN ...

# Re-join layers after integration
seu[["RNA"]] <- JoinLayers(seu[["RNA"]])

# Run UMAP for every integration method
int.umaps.plots <- list()
for (m in names(int.methods)[1:4]) {
  cat("\nRunning UMAP for", m, "integrated result...\n")
  int.dimred <- paste0("integrated.", m)
  umap.dimred <- paste0("umap.", m)
  seu <- RunUMAP(seu, dims = 1:30, reduction = int.dimred, reduction.name = umap.dimred)
  int.umaps.plots[[m]] <-  DimPlot(object = seu, reduction = umap.dimred, group.by = c("batch", "cell_type"), 
                                   combine = FALSE, label.size = 2, pt.size = 0.1)
}

## 
## Running UMAP for CCA integrated result...
## 
## Running UMAP for RPCA integrated result...
## 
## Running UMAP for Harmony integrated result...
## 
## Running UMAP for FastMNN integrated result...

# Save Seurat object
saveRDS(object = seu, file = file.path(res.dir[3], "seu_integrated.rds"))

(4) Assess integration

(15 min)

AIM: Assess integration qualitatively and quantitatively through dimensional reduction visualizations and LISI scores.

Qualitative viz

Plot the integrated embeddings below highlighting the batch and ground-truth cell_type labels.

## Assess integration by printing the plots using the "batch" and "cell_type" (ground-truth) labels
wrap_plots(c(int.umaps.plots$CCA, int.umaps.plots$RPCA, int.umaps.plots$Harmony, int.umaps.plots$FastMNN),
           ncol = 2, byrow = TRUE)

Quantitative metrics

Run the code below to compute the i/cLISI scores for every integrated embedding with the function getIntegrationMetrics() from the package scIntegrationMetrics (read more about the meaning of these metrics here).

## Assess quantitatively integration with scIntegrationMetrics

# Calculate metrics
int.mthds.names <- paste0("integrated.", names(int.methods)[1:4])
names(int.mthds.names) <- int.mthds.names
metrics <- list()
for (m in int.mthds.names) {
  key <- gsub("integrated.", "", m)
  cat("Computing i/cLISI metrics for integration method:", gsub("integrated.", "", key), "\n")
  metrics[[key]] <- getIntegrationMetrics(seu, meta.label = "cell_type", meta.batch = "batch",
                                          method.reduction = m, metrics = c("iLISI", "norm_iLISI", 
                                                                            #"CiLISI", "CiLISI_means", 
                                                                            "norm_cLISI", "norm_cLISI_means"))
}

## Computing i/cLISI metrics for integration method: CCA 
## Computing i/cLISI metrics for integration method: RPCA 
## Computing i/cLISI metrics for integration method: Harmony 
## Computing i/cLISI metrics for integration method: FastMNN

# Join metrics
metrics <- as.data.frame(do.call(cbind, metrics))

Print the result below.

# Print table
knitr::kable(metrics)

	CCA	RPCA	Harmony	FastMNN
iLISI	1.100594	1.02563	1.004797	1.035199
norm_iLISI	0.1005943	0.02562982	0.004796592	0.03519914
norm_cLISI	0.9452894	0.9424586	0.9591123	0.9403612
norm_cLISI_means	0.917104	0.9022	0.932591	0.9084225

R packages used and respective versions

## R packages and versions used in these analyses
sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] scIntegrationMetrics_1.1 SeuratWrappers_0.3.2     patchwork_1.2.0         
## [4] Seurat_5.1.0             SeuratObject_5.0.2       sp_2.1-4                
## [7] dplyr_1.1.4             
## 
## loaded via a namespace (and not attached):
##   [1] utf8_1.2.4                  spatstat.explore_3.2-7     
##   [3] reticulate_1.38.0           R.utils_2.12.3             
##   [5] tidyselect_1.2.1            htmlwidgets_1.6.4          
##   [7] BiocParallel_1.28.3         grid_4.1.0                 
##   [9] Rtsne_0.17                  ScaledMatrix_1.2.0         
##  [11] zellkonverter_1.4.0         munsell_0.5.1              
##  [13] codetools_0.2-18            ica_1.0-3                  
##  [15] future_1.33.2               miniUI_0.1.1.1             
##  [17] batchelor_1.10.0            withr_3.0.0                
##  [19] spatstat.random_3.2-3       colorspace_2.1-0           
##  [21] progressr_0.14.0            filelock_1.0.3             
##  [23] Biobase_2.54.0              highr_0.11                 
##  [25] knitr_1.47                  rstudioapi_0.13            
##  [27] stats4_4.1.0                SingleCellExperiment_1.16.0
##  [29] ROCR_1.0-11                 tensor_1.5                 
##  [31] listenv_0.9.1               MatrixGenerics_1.6.0       
##  [33] labeling_0.4.3              harmony_1.2.0              
##  [35] GenomeInfoDbData_1.2.7      polyclip_1.10-6            
##  [37] farver_2.1.2                basilisk_1.6.0             
##  [39] parallelly_1.37.1           vctrs_0.6.5                
##  [41] generics_0.1.3              xfun_0.45                  
##  [43] R6_2.5.1                    doParallel_1.0.17          
##  [45] GenomeInfoDb_1.30.1         clue_0.3-65                
##  [47] rsvd_1.0.5                  DelayedArray_0.20.0        
##  [49] bitops_1.0-7                spatstat.utils_3.0-5       
##  [51] cachem_1.1.0                assertthat_0.2.1           
##  [53] promises_1.3.0              scales_1.3.0               
##  [55] gtable_0.3.5                beachmat_2.10.0            
##  [57] globals_0.16.3              goftest_1.2-3              
##  [59] klippy_0.0.0.9500           spam_2.10-0                
##  [61] rlang_1.1.4                 GlobalOptions_0.1.2        
##  [63] splines_4.1.0               lazyeval_0.2.2             
##  [65] spatstat.geom_3.2-9         BiocManager_1.30.23        
##  [67] yaml_2.3.8                  reshape2_1.4.4             
##  [69] abind_1.4-5                 httpuv_1.6.15              
##  [71] tools_4.1.0                 ggplot2_3.5.1              
##  [73] jquerylib_0.1.4             RColorBrewer_1.1-3         
##  [75] BiocGenerics_0.40.0         ggridges_0.5.6             
##  [77] Rcpp_1.0.12                 plyr_1.8.9                 
##  [79] sparseMatrixStats_1.6.0     zlibbioc_1.40.0            
##  [81] purrr_1.0.2                 RCurl_1.98-1.14            
##  [83] basilisk.utils_1.6.0        deldir_2.0-4               
##  [85] pbapply_1.7-2               GetoptLong_1.0.5           
##  [87] cowplot_1.1.3               S4Vectors_0.32.4           
##  [89] zoo_1.8-12                  SummarizedExperiment_1.24.0
##  [91] ggrepel_0.9.5               cluster_2.1.2              
##  [93] magrittr_2.0.3              data.table_1.15.4          
##  [95] RSpectra_0.16-1             scattermore_1.2            
##  [97] ResidualMatrix_1.4.0        circlize_0.4.16            
##  [99] lmtest_0.9-40               RANN_2.6.1                 
## [101] fitdistrplus_1.1-11         matrixStats_1.1.0          
## [103] mime_0.12                   evaluate_0.24.0            
## [105] xtable_1.8-4                RhpcBLASctl_0.23-42        
## [107] fastDummies_1.7.3           IRanges_2.28.0             
## [109] gridExtra_2.3               shape_1.4.6.1              
## [111] compiler_4.1.0              tibble_3.2.1               
## [113] KernSmooth_2.23-20          crayon_1.5.3               
## [115] R.oo_1.26.0                 htmltools_0.5.8.1          
## [117] mgcv_1.8-35                 later_1.3.2                
## [119] tidyr_1.3.1                 DBI_1.2.3                  
## [121] ComplexHeatmap_2.15.4       MASS_7.3-54                
## [123] Matrix_1.6-5                permute_0.9-7              
## [125] cli_3.6.3                   R.methodsS3_1.8.2          
## [127] parallel_4.1.0              dotCall64_1.1-1            
## [129] igraph_2.0.3                GenomicRanges_1.46.1       
## [131] pkgconfig_2.0.3             dir.expiry_1.2.0           
## [133] scuttle_1.4.0               plotly_4.10.4              
## [135] spatstat.sparse_3.1-0       foreach_1.5.2              
## [137] bslib_0.7.0                 XVector_0.34.0             
## [139] stringr_1.5.1               digest_0.6.36              
## [141] sctransform_0.4.1           RcppAnnoy_0.0.22           
## [143] vegan_2.6-6.1               spatstat.data_3.1-2        
## [145] rmarkdown_2.27              leiden_0.4.3.1             
## [147] uwot_0.2.2                  DelayedMatrixStats_1.16.0  
## [149] shiny_1.8.1.1               rjson_0.2.21               
## [151] lifecycle_1.0.4             nlme_3.1-152               
## [153] jsonlite_1.8.8              BiocNeighbors_1.12.0       
## [155] viridisLite_0.4.2           limma_3.50.3               
## [157] fansi_1.0.6                 pillar_1.9.0               
## [159] lattice_0.20-44             fastmap_1.2.0              
## [161] httr_1.4.7                  survival_3.2-11            
## [163] glue_1.7.0                  remotes_2.5.0              
## [165] png_0.1-8                   iterators_1.0.14           
## [167] presto_1.0.0                stringi_1.8.4              
## [169] sass_0.4.9                  RcppHNSW_0.6.0             
## [171] BiocSingular_1.10.0         irlba_2.3.5.1              
## [173] future.apply_1.11.2

Cross-tissue integration task

António Sousa (e-mail: aggode@utu.fi) - Elo lab (https://elolab.utu.fi)

03/07/2024