Cell states

Dataset

The single-cell RNA sequencing dataset of peripheral blood mononuclear cells (PBMCs) from healthy versus COVID-19 infected donors published by Wilk et al., (2020) will be used to demonstrated how cell cluster probability can be used to identify perturbed cell states and associated differential gene expression programs. In addition to the original cell type annotations, the object provided contains the cell type annotations given by Välikangas et al., 2022.

The original dataset has been downsampled to 21K cells (1.5K cells per sample; from 44,721 cells) to increase the speed and reduce the computational resources required to run this vignette. To facilitate the reproduction of this vignette, the data is distributed through Zenodo as a SingleCellExperiment object, the object (class) required by most functions in Coralysis (see Chapter 4 The SingleCellExperiment class - OSCA manual)

# Install packages 
if (! "scater" %in% installed.packages()) pak::pkg_install("scater")
if (! "ComplexHeatmap" %in% installed.packages()) pak::pkg_install("ComplexHeatmap")

# Import packages
library("dplyr")
library("scater")
library("ggplot2")
library("Coralysis")
library("SingleCellExperiment")

# Import data 
data.url <- "https://zenodo.org/records/14845751/files/covid_Wilk_et_al.rds?download=1"
sce <- readRDS(file = url(data.url))

# Downsample every sample to 1.5K cells: 21K cells in total
cells.by.donor <- split(x = colnames(sce), f = sce$Donor_full)
ncells <- 1.5e3
set.seed(123)
down.cells <- lapply(X = cells.by.donor, FUN = function(x) {
    sample(x = x, size = ncells, replace = FALSE)
})
down.cells <- unlist(down.cells)
sce <- sce[,down.cells]

Normalization

Coralysis requires log-normalised data as input. The dataset above has been previously normalised with the basic log-normalisation method in Seurat. Below is a simple custom function that can be used to perform Seurat (see NormalizeData).

## Normalize the data
# log1p normalization
SeuratNormalisation <- function(object) {
  # 'SeuratNormalisation()' function applies the basic Seurat normalization to
  #a SingleCellExperiment object with a 'counts' assay. Normalized data
  #is saved in the 'logcounts' assay.
  logcounts(object) <- apply(counts(object), 2, function(x) {
    log1p(x/sum(x)*10000)
  }) # log1p normalization w/ 10K scaling factor
  logcounts(object) <- as(logcounts(object), "sparseMatrix")
  return(object)
}
sce <- SeuratNormalisation(object = sce)

In alternative, scran normalization can be performed, which is particularly beneficial if rare cell types exist (see the following vignette).

## scran normalisation
ScranNormalisation <- function(object) {
    norm_clusters <- scran::quickCluster(object)
    object <- scran::computeSumFactors(object, clusters = norm_clusters)
    object <- scater::logNormCounts(object)
    return(object)
}

# Split object by batch: 'Donor_full'
batch.levels <- levels(sce$Donor_full)
names(batch.levels) <- batch.levels
sce.list <- lapply(batch.levels, function(x) sce[,sce$Donor_full == x])

# Apply normalisation
set.seed(123)
sce.list <- lapply(sce.list, ScranNormalisation)

# Join 
sce <- do.call(cbind, sce.list)

HVG selection

Highly variable genes (HVG) can be selected using the R package scran. The variable Donor_full from colData is used as batch label. The SingleCellExperiment object allows alternative experiments to the main experiment. This is important to keep a backup of all genes in the same SingleCellExperiment object before selecting HVGs (see SingleCellExperiment vignette).

# Feature selection with 'scran' package
nhvg <- 2000
batch.label <- "Donor_full"
sce[[batch.label]] <- factor(sce[[batch.label]])
m.hvg <- scran::modelGeneVar(sce, block=sce[[batch.label]])
hvg.ordered <- order(m.hvg[["bio"]], decreasing=TRUE)
top.hvg <- row.names(sce)[hvg.ordered[1:nhvg]]
rowData(sce) <- cbind("highly_variable" = row.names(sce) %in% top.hvg, m.hvg)

# Subset object 
sce <- sce[top.hvg,]

DimRed: pre-integration

A PCA and UMAP representation of dataset before performing integration can be obtained with Coralysis. Several parameters can be provided to the functions in order to adjust the analyses to the dataset (see ?Coralysis::RunPCA() and ?Coralysis::RunUMAP).

# Dimensional reduction - unintegrated
set.seed(123)
sce <- RunPCA(object = sce, assay.name = "logcounts", dimred.name = "unintPCA")
set.seed(123)
sce <- RunUMAP(object = sce, dims = 1:30, dimred.type = "unintPCA", 
               umap.method = "uwot", dimred.name = "unintUMAP", 
               n_neighbors = 15, min_dist = 0.3)

# Plotting 
batch.label <- "Donor_full"
original.cell.label <- "cell_type"
cell.label <- "predicted_celltype_l2"
vars2plot <- c(batch.label, original.cell.label, cell.label)
names(vars2plot) <- vars2plot
unint.plts <- lapply(X = vars2plot, FUN = function(x) {
    PlotDimRed(object = sce, color.by = x, dimred = "unintUMAP", point.size = 0.2, 
               legend.nrow = 10, point.stroke = 0.1)
}) # plot each variable
all.unint.plts <- cowplot::plot_grid(plotlist = unint.plts, ncol = 3, align = "vh") # join plots together
all.unint.plts

It is clear the donor effect, particularly for some cell types, such as CD14 monocytes and CD8 memory T cells.

Multi-level integration

Multi-level integration can be performed with Coralysis by running the function RunParallelDivisiveICP(). Only the batch.label parameter is required. The batch.label corresponds to a variable in colData. In this case Donor_full. The RunParallelDivisiveICP() function can be run in parallel by providing the number of threads to reduce the run time (it takes around 25 min. with ICP batch size 1K cells and 4 threads). Consult the full documentation of this function with ?RunParallelDivisiveICP.

# Perform multi-level integration
set.seed(123)
sce <- RunParallelDivisiveICP(object = sce, batch.label = "Donor_full", 
                              icp.batch.size = 1000, threads = 4)

ICP clusters

The cell cluster probability for a given icp run through its divisive rounds can be plotted with the function PlotClusterTree(). A colData variable can be provided to see its composition per cluster across divisive rounds for the respective icp run. See examples below for icp run number 16.

## Plot cluster tree for icp run 16
# Probability
prob.cluster.tree <- PlotClusterTree(object = sce, icp.run = 16)

# batch label distribution
batch.cluster.tree <- PlotClusterTree(object = sce, icp.run = 16, color.by = "Donor_full")

# original cell type label distribution
orig.cell.cluster.tree <- PlotClusterTree(object = sce, icp.run = 16, color.by = "cell_type")

# Azimuth/predicted cell type label distribution
cell.cluster.tree <- PlotClusterTree(object = sce, icp.run = 16, color.by = "predicted_celltype_l2")

# Join all plots with 'cowplot'
all.cluster.tree <- cowplot::plot_grid(cowplot::plot_grid(prob.cluster.tree, batch.cluster.tree, 
                                                          ncol = 2, rel_widths = c(0.5, 0.5), align = "v"), 
                                       cowplot::plot_grid(orig.cell.cluster.tree, cell.cluster.tree, 
                                                          ncol = 2, rel_widths = c(0.5, 0.5), align = "vh"),
                                       ncol = 1)
all.cluster.tree

Some cell types formed unique clusters, but quite few remained mixed. Multi-level integration resolution might get better by running RunParallelDivisiveICP() with 32 cluster, i.e., k = 32, instead 16.

DimRed: after integration

Coralysis returns a list of cell cluster probabilities saved at metadata(sce)$coralysis$joint.probability of length equal to the number of icp runs, i.e., L (by default L=50), times the number of icp rounds, i.e., log2(k) (by default k=16, thus 4 rounds of divisive icp). The cell cluster probability can be concatenated to compute a PCA in order to obtain an integrated embedded. By default only the cell cluster probability corresponding to the last icp round is used. The Coralysis integrated embedding can be used downstream for non-linear dimensional reduction, t-SNE or UMAP, and clustering.

# Dimensional reduction - unintegrated
set.seed(123)
sce <- RunPCA(object = sce, assay.name = "joint.probability", dimred.name = "intPCA")

# UMAP
set.seed(123)
sce <- RunUMAP(object = sce, dimred.type = "intPCA", 
               umap.method = "uwot", dimred.name = "intUMAP", 
               dims = 1:30, n_neighbors = 15, min_dist = 0.3)

# Plotting 
int.plts <- lapply(X = vars2plot, FUN = function(x) {
    PlotDimRed(object = sce, color.by = x, dimred = "intUMAP", point.size = 0.2, legend.nrow = 10, point.stroke = 0.1)
}) # plot each variable
all.int.plts <- cowplot::plot_grid(plotlist = int.plts, ncol = 3, align = "vh") # join plots together
all.int.plts

Coralysis efficiently integrated the donor PBMC samples, identified small populations (e.g., neutrophils, Treg, MAIT, dnT, NK CD56bright), and differentiation trajectories (e.g., B naive to memory).

Graph-based clustering

Graph-based clustering can be obtained by running the scran function clusterCells() using the Coralysis integrated embedding. In alternative, the joint cluster probabilities can be used for clustering. The advantages of using the integrated embedding instead joint probabilities are: computational efficiency; and, noise robustness. Using the joint probabilities instead PCA embedding takes considerably more time.

### Graph-based clustering with scran
## Coralysis integrated PCA embedding 
reducedDim(sce, "cltPCA") <- reducedDim(sce, "intPCA")[,1:30]
set.seed(1024)
sce$emb_clusters <- scran::clusterCells(sce, use.dimred = "cltPCA", 
                                        BLUSPARAM = bluster::SNNGraphParam(k = 15, cluster.fun = "louvain"))

## Coralysis joint cluster probabilities
# retrieve ICP cell cluster probability tables for every icp run, but only for the last divisive round
probs <- GetCellClusterProbability(object = sce, icp.round = 4) 
# dim(probs) # 21K cells x 800 clusters (16 clusters x 50 icp runs = 800 clusters) 
probs <- t(probs)
colnames(probs) <- colnames(sce)
prob.sce <- SingleCellExperiment(assays = list("probability" = probs))
set.seed(1024)
sce$prob_clusters <- scran::clusterCells(prob.sce, 
                                         assay.type = "probability",
                                         BLUSPARAM = bluster::SNNGraphParam(k = 15, cluster.fun = "louvain"))

# Plotting
vars2plot2 <- c(cell.label, "emb_clusters", "prob_clusters")
names(vars2plot2) <- vars2plot2
clts.plts <- lapply(X = vars2plot2, FUN = function(x) {
    PlotDimRed(object = sce, dimred = "intUMAP", color.by = x, point.size = 0.2, legend.nrow = 10, point.stroke = 0.1)
}) # plot each variable
all.clts.plts <- cowplot::plot_grid(plotlist = clts.plts, ncol = 3, align = "vh") # join plots together
all.clts.plts

Cell state identification

The cell cluster probability aggregated by mean or median across the icp runs can be obtained with the function SummariseCellClusterProbability() and plotted to infer transient and steady cell states.

# Summarise cell cluster probability
sce <- SummariseCellClusterProbability(object = sce, icp.round = 4) # save result in 'colData'
# colData(sce) # check the colData

# Plot cell cluster probabilities - mean
# possible options: "mean_probs", "median_probs", "scaled_median_probs" 
PlotExpression(object = sce, color.by = "scaled_mean_probs", dimred = "intUMAP", color.scale = "viridis", 
               point.size = 0.2, point.stroke = 0.1, legend.title = "Mean prob.\n(min-max)")

For instance, B cell states (naive, intermediate, memory) are not well supported by Coralysis cell cluster probability, probably because it would require to run Coralysis at higher resolution, i.e., k=32 instead k=16.

Gene coefficients

Gene coefficients can be obtained for a given cell label through majority voting with the function MajorityVotingFeatures(). The majority voting is performed by searching for the most representative cluster for a given cell label across all possible clusters (i.e., across all icp runs and rounds). The most representative cluster corresponds to the cluster with the highest majority voting score. This corresponds to the geometric mean between the proportion of cells from the given label intersected with a cluster and the proportion of cells from that same cluster that intersected with the cells from the given label. The higher the score the better. A cluster that scores 1 indicates that all its cells correspond to the assigned label, and vice versa; i.e., all the cells with the assigned label belong to this cluster. For example, MajorityVotingFeatures() by providing the colData variable "emb_clusters"(i.e., label="emb_clusters").

# Get label gene coefficients by majority voting
clts.gene.coeff <- MajorityVotingFeatures(object = sce, label = "emb_clusters")

The function MajorityVotingFeatures() returns a list with two elements:

feature_coeff: a list of data frames comprising the gene coefficients and respective weights.
summary: a data frame summarizing which ICP cluster best represents each cell label, along with the respective majority voting score.

# Print cluster gene coefficients summary
clts.gene.coeff$summary

##    label icp_run icp_round cluster     score
## 1      1      22         4       8 0.8157863
## 2      2      24         4      16 0.9421878
## 3      3      21         4      11 0.5959206
## 4      4      21         4      14 0.6799433
## 5      5      29         4       7 0.6721712
## 6      6      48         4      11 0.7086803
## 7      7      36         4      14 0.4917519
## 8      8      16         4       2 0.8996775
## 9      9      22         4       7 0.5562097
## 10    10      47         4      16 0.6487200
## 11    11      19         4       4 0.9326988
## 12    12      24         4       9 0.5858352
## 13    13      21         4       2 0.7723912
## 14    14      41         4       6 0.8999670
## 15    15      37         4       3 0.8040199
## 16    16      29         4      10 0.6894886
## 17    17      48         4      12 0.9532055
## 18    18      46         4       5 0.7898323
## 19    19      18         4      10 0.6683786
## 20    20      20         4       7 0.9176726
## 21    21      31         4       1 0.8014923

Unique marker: PRSS23

One of the strengths of Coralyis consist of finding unique cluster markers as it was the case for cluster 2.

# Print top 30 positive gene coefficients for cluster 2: NK + CD8 TEM 
head(clts.gene.coeff$feature_coeff$`2`[order(clts.gene.coeff$feature_coeff$`2`[,2], decreasing = TRUE),], 30)

##     feature   coeff_clt16
## 43   PRSS23  4.3504965756
## 18     GZMB  0.1637480659
## 19     GZMH  0.1308493859
## 79   ZBTB44  0.1109175583
## 40    PATL2  0.0983686833
## 72    TBX21  0.0954275398
## 7     CD247  0.0753691598
## 15     GNLY  0.0730897748
## 61    RUNX3  0.0686593222
## 39    PARP8  0.0654860566
## 42     PRF1  0.0504557585
## 8      CDK6  0.0427174844
## 27    KLRD1  0.0402690820
## 17     GZMA  0.0375472215
## 46      PXN  0.0360506239
## 14     GNG2  0.0223021975
## 25   IPCEF1  0.0156802538
## 45   PTGER4  0.0087374973
## 6      CCL5  0.0082154040
## 47    RBM27  0.0031892948
## 80     ZEB2  0.0018210680
## 69  ST3GAL1 -0.0005403014
## 12   GIMAP4 -0.0008745088
## 68  SMARCA2 -0.0013411496
## 21 HSP90AB1 -0.0020678336
## 75   TMSB4X -0.0027057304
## 51    RPL34 -0.0032341377
## 57     RPS3 -0.0033493263
## 4     BIRC6 -0.0040249087
## 2      ACTB -0.0047227878

# Plot the expression of PRSS23
PlotExpression(sce, color.by = "PRSS23", dimred = "intUMAP", point.size = 0.5, point.stroke = 0.5)

Cluster markers

CD14 Mono: 1 vs 5

Coralysis identified three main clusters of CD14 monocytes: 1 (2,028 cells), 5 (1,651), and 9 (935) (considering the Louvain clustering obtained with the integrated embedding). The clusters 1 was compared against 5 below.

# Cluster gene coefficients: 1 versus 8
clt1 <- clts.gene.coeff$feature_coeff$`1`
clt1 <- clt1[order(clt1[,2], decreasing = TRUE),]
clt5 <- clts.gene.coeff$feature_coeff$`5`
clt5 <- clt5[order(clt5[,2], decreasing = TRUE),]

# Merge cluster coefficients
clt1vs5 <- merge(clt1, clt5, all = TRUE)
clt1vs5[is.na(clt1vs5)] <- 0
clt1vs5.parsed <- clt1vs5 %>% 
    mutate("coeff_variation" = abs(coeff_clt7 - coeff_clt8)) %>% 
    arrange(desc(coeff_variation)) %>% 
    filter(coeff_clt7!=0 & coeff_clt8!=0) %>% 
    filter((coeff_clt7 * coeff_clt8)<0)
top5.clt1 <- clt1vs5.parsed  %>% 
    arrange(desc(coeff_clt8)) %>%
    pull(feature) %>% head(5)
top5.clt5 <- clt1vs5.parsed %>% 
    arrange(desc(coeff_clt7)) %>% 
    pull(feature) %>% head(5)

# Top 5 positive coefficients in each cluster
top5.genes <- c(top5.clt1, top5.clt5)
names(top5.genes) <- top5.genes

# Plot
top5.gexp.mono.clts.plts <- lapply(X = top5.genes, FUN = function(x) {
    PlotExpression(object = sce, color.by = x, dimred = "intUMAP", 
                   scale.values = TRUE, point.size = 0.25, point.stroke = 0.25)
})
all.top5.gexp.mono.clts.plts <- cowplot::plot_grid(plotlist = top5.gexp.mono.clts.plts, ncol = 5, align = "vh")
all.top5.gexp.mono.clts.plts

The expression level of CYP1B1 in cluster 1 is slightly higher in COVID-19 infected donor cells than healthy donors, particularly, for ventilated versus healthy donors.

scater::plotExpression(object = sce[,sce$emb_clusters=="1"], features = "CYP1B1", 
                       color_by = "Ventilated", x = "Ventilated")

The previous plots suggest that clusters 1 and 5 are phenotypically different. In addition, low expression of HLA-DR (HLA-DRA, HLA-DRB1) genes has been associated with severe inflammatory conditions like COVID-19 (see plots below) as well as high expression of S100A9. These results suggest that cluster 1 corresponds to strongly activated or possibly dysregulated monocytes, which are more common in severe COVID-19 than in cluster 5. Indeed, 40% the cells in cluster 1 are from COVID-19 infected donors that required ventilation, whereas this percentage was around to 25% for cluster 5.

genes2plot <- c("HLA-DRA", "HLA-DRB1")
gexp.hla.plots <- lapply(genes2plot, function(x) {
    PlotExpression(object = sce, color.by = x, dimred = "intUMAP",  
                   scale.values = TRUE, point.size = 0.25, point.stroke = 0.25)
})
cowplot::plot_grid(plotlist = gexp.hla.plots, ncol = 2, align = "vh")

Perturbed cell states

CD16 Monocytes

The cell cluster probability can be used to search for altered cell states, e.g., associated with COVID-19. The distribution of cell cluster probability per cluster (integrated embedding) per ventilated group is shown below after running the function CellClusterProbabilityDistribution(). Cluster 14, roughly corresponding to CD16 monocytes, shows a difference between healthy and COVID associated cells (non-ventilated and ventilated donor cells).

## Disease associated cell state 
# Search for diseased affected cell states
prob.dist <- CellClusterProbabilityDistribution(object = sce, 
                                                label = "emb_clusters", 
                                                group = "Ventilated", 
                                                probability = "scaled_mean_probs")
prob.dist

Cell cluster probability bins per cluster label can be obtained by running BinCellClusterProbability(), which returns a SingleCellExperiment object with logcounts average per label per cell probability bin. The number of probability bins can be provided to the parameter bins. The cell frequency per cell bin per group of interest can be obtained with the function TabulateCellBinsByGroup(). The heatmap below shows the cell frequency across ventilated groups per bins for cluster 14 (corresponding to CD16 monocytes). The cell frequency represented below corresponds to the cell proportions per group (healthy, non-ventilated and ventilated) scaled by bins, i.e., columns. Cell states with higher probabilities tend to be more “healthier” than states with lower probabilities.

# cell states SCE object 
cellstate.sce <- BinCellClusterProbability(object = sce, label = "emb_clusters", icp.round = 4, bins = 20) 

# Tabulate cell bins by group 
cellbins.tables <- TabulateCellBinsByGroup(object = cellstate.sce, group = "Ventilated", relative = TRUE, margin = 1)

# Heatmap of cell bins distribution by group for cluster 20 - CD16 Monocytes
cluster <- "14"
col.fun <- circlize::colorRamp2(c(-1.5, 0, 1.5), viridis::inferno(3))
heat.plot <- ComplexHeatmap::Heatmap(matrix = scale(cellbins.tables[[cluster]]), 
                                     name = "Col. Z-score", 
                                     cluster_rows = FALSE, 
                                     cluster_columns = FALSE, 
                                     row_names_side = "left", 
                                     col = col.fun, 
                                     column_title = "Distribution of cells per cell cluster probability bin", 
                                     heatmap_legend_param = list(legend_direction = "horizontal", 
                                                                 title_position = "topcenter"))
ComplexHeatmap::draw(heat.plot, heatmap_legend_side = "bottom", annotation_legend_side = "bottom")

The correlation between cell cluster probability bins from cluster 18 and averaged gene expression can be obtained with the function CellBinsFeatureCorrelation(). Among the several positively and negatively correlated genes, the expression of C1QB seems to be enriched in diseased cells, with a negative Pearson coefficient close to 0.7. This gene was selected as an example due to its expression being relatively limited to cluster 14.

# Pearson correlated features with cluster 24
cor.features.clt14 <- CellBinsFeatureCorrelation(object = cellstate.sce, labels = cluster)
cor.features.clt14 <- cor.features.clt14 %>%
    filter(!is.na(`14`)) %>% 
    arrange(`14`) # C1QB
exp.C1QB.umap <- PlotExpression(sce, color.by = "C1QB", dimred = "intUMAP", 
                                scale.values = TRUE, point.size = 0.75, point.stroke = 0.25)
exp.C1QB.violin <- scater::plotExpression(sce[,sce$emb_clusters == cluster], features = "C1QB",
                                          x = "Ventilated", color_by = "Ventilated")
exp.C1QB.umap

exp.C1QB.violin

R session

# R session
sessionInfo()

## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] Coralysis_1.0.0             scater_1.34.0              
##  [3] ggplot2_3.5.1               scuttle_1.16.0             
##  [5] SingleCellExperiment_1.28.1 SummarizedExperiment_1.36.0
##  [7] Biobase_2.66.0              GenomicRanges_1.58.0       
##  [9] GenomeInfoDb_1.42.3         IRanges_2.40.1             
## [11] S4Vectors_0.44.0            BiocGenerics_0.52.0        
## [13] MatrixGenerics_1.18.1       matrixStats_1.5.0          
## [15] dplyr_1.1.4                
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3       shape_1.4.6.1            jsonlite_1.8.9          
##   [4] magrittr_2.0.3           modeltools_0.2-23        ggbeeswarm_0.7.2        
##   [7] farver_2.1.2             rmarkdown_2.29           GlobalOptions_0.1.2     
##  [10] fs_1.6.5                 zlibbioc_1.52.0          ragg_1.3.3              
##  [13] vctrs_0.6.5              Cairo_1.6-2              htmltools_0.5.8.1       
##  [16] S4Arrays_1.6.0           BiocNeighbors_2.0.1      SparseArray_1.6.1       
##  [19] sass_0.4.9               bslib_0.9.0              desc_1.4.3              
##  [22] plyr_1.8.9               cachem_1.1.0             igraph_2.1.4            
##  [25] lifecycle_1.0.4          iterators_1.0.14         pkgconfig_2.0.3         
##  [28] rsvd_1.0.5               Matrix_1.7-1             R6_2.6.0                
##  [31] fastmap_1.2.0            clue_0.3-66              GenomeInfoDbData_1.2.13 
##  [34] aricode_1.0.3            digest_0.6.37            colorspace_2.1-1        
##  [37] dqrng_0.4.1              RSpectra_0.16-2          irlba_2.3.5.1           
##  [40] textshaping_1.0.0        beachmat_2.22.0          labeling_0.4.3          
##  [43] httr_1.4.7               polyclip_1.10-7          abind_1.4-8             
##  [46] compiler_4.4.2           rngtools_1.5.2           doParallel_1.0.17       
##  [49] withr_3.0.2              BiocParallel_1.40.0      viridis_0.6.5           
##  [52] ggforce_0.4.2            LiblineaR_2.10-24        MASS_7.3-61             
##  [55] DelayedArray_0.32.0      rjson_0.2.23             bluster_1.16.0          
##  [58] tools_4.4.2              vipor_0.4.7              beeswarm_0.4.0          
##  [61] scatterpie_0.2.4         flexclust_1.4-2          glue_1.8.0              
##  [64] grid_4.4.2               cluster_2.1.6            reshape2_1.4.4          
##  [67] generics_0.1.3           snow_0.4-4               gtable_0.3.6            
##  [70] class_7.3-22             tidyr_1.3.1              BiocSingular_1.22.0     
##  [73] ScaledMatrix_1.14.0      metapod_1.14.0           XVector_0.46.0          
##  [76] ggrepel_0.9.6            RANN_2.6.2               foreach_1.5.2           
##  [79] pillar_1.10.1            stringr_1.5.1            yulab.utils_0.2.0       
##  [82] limma_3.62.2             circlize_0.4.16          tweenr_2.0.3            
##  [85] lattice_0.22-6           SparseM_1.84-2           tidyselect_1.2.1        
##  [88] ComplexHeatmap_2.22.0    locfit_1.5-9.11          knitr_1.49              
##  [91] gridExtra_2.3            edgeR_4.4.2              xfun_0.50               
##  [94] statmod_1.5.0            pheatmap_1.0.12          stringi_1.8.4           
##  [97] UCSC.utils_1.2.0         ggfun_0.1.8              yaml_2.3.10             
## [100] evaluate_1.0.3           codetools_0.2-20         tibble_3.2.1            
## [103] cli_3.6.4                systemfonts_1.2.1        munsell_0.5.1           
## [106] jquerylib_0.1.4          Rcpp_1.0.14              doSNOW_1.0.20           
## [109] png_0.1-8                ggrastr_1.0.2            parallel_4.4.2          
## [112] pkgdown_2.1.1            doRNG_1.8.6.1            scran_1.34.0            
## [115] sparseMatrixStats_1.18.0 viridisLite_0.4.2        scales_1.3.0            
## [118] purrr_1.0.4              crayon_1.5.3             GetoptLong_1.0.5        
## [121] rlang_1.1.5              cowplot_1.1.3

References

Amezquita R, Lun A, Becht E, Carey V, Carpp L, Geistlinger L, Marini F, Rue-Albrecht K, Risso D, Soneson C, Waldron L, Pages H, Smith M, Huber W, Morgan M, Gottardo R, Hicks S (2020). “Orchestrating single-cell analysis with Bioconductor.” Nature Methods, 17, 137-145. https://www.nature.com/articles/s41592-019-0654-x.

Gu, Z. (2022) “Complex heatmap visualization.” iMeta. 1(3):e43. doi: 10.1002/imt2.43.

Lun ATL, McCarthy DJ, Marioni JC (2016). “A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor.” F1000Res., 5, 2122. doi:10.12688/f1000research.9501.2.

McCarthy DJ, Campbell KR, Lun ATL, Willis QF (2017). “Scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R.” Bioinformatics, 33, 1179-1186. doi:10.1093/bioinformatics/btw777.

Sousa A, Smolander J, Junttila S, Elo L (2025). “Coralysis enables sensitive identification of imbalanced cell types and states in single-cell data via multi-level integration.” bioRxiv. doi:10.1101/2025.02.07.637023

Wickham H (2016). “ggplot2: Elegant Graphics for Data Analysis.” Springer-Verlag New York.

Compiled: 14/02/2025