Reference-mapping task

This single-cell RNA-seq reference-mapping task describes an example of reference-mapping a human COVID-19 peripheral blood mononuclear cells (PBMC) against a previously annotate human PBMC data set. The COVID data set was downloaded as Seurat object from cziscience (see the R script 01_create_datasets.R) and the reference previously annotated PBMC data set retrieved from SeuratData (v.0.2.2.9001):

reference (ref): 5,000 genes x 36,433 cells
query (query): 17,374 genes x 14,783 cells

The analyses performed in this notebook rely in the Seurat (v.5.1.0) and Azimuth (v.0.5.0).

Import the main packages used in this notebook: Seurat (v.5.1.0), SeuratWrappers (v.0.3.2 - integration wrappers for Seurat), dplyr (v.1.1.4 - wrangling data), patchwork (v.1.2.0 - visualization), ComplexHeatmap (v.2.15.4 - heatmap), Azimuth (v.0.5.0 - reference-mapping).

## Import packages
library("dplyr") # data wrangling
library("Seurat") # scRNA-seq analysis
library("Azimuth") # reference-mapping
library("patchwork") # viz
library("ComplexHeatmap") # heatmap

Create output directories to save intermediate results, figures, tables and R objects.

## Output directories
res.dir <- file.path("../results", "covid_refmap_task", c("plots", "tables", "objects"))
for (folder in res.dir) if (!dir.exists(folder)) dir.create(path = folder, recursive = TRUE)

(1) Import datasets

(5 min)

AIM: Import and explore the Seurat object data.

Import the human PBMCs from COVID-19 patients (query) as well as the healthy human PBMCs (ref) as Seurat objects located at data/covid.rds and data/pbmcref.

## Import query and reference Seurat objects
data.dir <- "../data"

# Reference
ref <- readRDS(file = file.path(data.dir, "pbmcref.rds"))

# Query
query <- readRDS(file = file.path(data.dir, "covid.rds"))

Explore quickly the Seurat query and ref objects.

## Explore Seurat objects
# Print Seurat object
query

## An object of class Seurat 
## 17374 features across 14783 samples within 1 assay 
## Active assay: RNA (17374 features, 0 variable features)
##  2 layers present: counts, data
##  3 dimensional reductions calculated: pca, tsne, umap

ref

## An object of class Seurat 
## 5228 features across 36433 samples within 2 assays 
## Active assay: refAssay (5000 features, 0 variable features)
##  1 layer present: data
##  1 other assay present: ADT
##  2 dimensional reductions calculated: refUMAP, refDR

# Structure
#str(query)
#str(ref)

# Check meta.data
head(query@meta.data)

##                           orig.ident nCount_RNA nFeature_RNA disease stage
## Guo-AAACCTGAGAGCTTCT-2 SeuratProject       4332         1407        severe
## Guo-AAACCTGAGAGGTTGC-7 SeuratProject       2220          903     remission
## Guo-AAACCTGAGATACACA-3 SeuratProject       2493          877     remission
## Guo-AAACCTGAGCGATTCT-1 SeuratProject       2159          933        severe
## Guo-AAACCTGAGTGAAGAG-3 SeuratProject       1067          467     remission
## Guo-AAACCTGAGTTGAGTA-2 SeuratProject       1905          744        severe
##                                          treatment timepoint
## Guo-AAACCTGAGAGCTTCT-2 400 mg Tocilizumab at day 1     day 1
## Guo-AAACCTGAGAGGTTGC-7 400 mg Tocilizumab at day 1     day 7
## Guo-AAACCTGAGATACACA-3 400 mg Tocilizumab at day 1     day 5
## Guo-AAACCTGAGCGATTCT-1 400 mg Tocilizumab at day 1     day 1
## Guo-AAACCTGAGTGAAGAG-3 400 mg Tocilizumab at day 1     day 5
## Guo-AAACCTGAGTTGAGTA-2 400 mg Tocilizumab at day 1     day 1
##                                                Dataset           sample
## Guo-AAACCTGAGAGCTTCT-2 Guo et al._Nature Communication Guo_P1-day1-rep2
## Guo-AAACCTGAGAGGTTGC-7 Guo et al._Nature Communication      Guo_P2-day7
## Guo-AAACCTGAGATACACA-3 Guo et al._Nature Communication Guo_P1-day5-rep1
## Guo-AAACCTGAGCGATTCT-1 Guo et al._Nature Communication Guo_P1-day1-rep1
## Guo-AAACCTGAGTGAAGAG-3 Guo et al._Nature Communication Guo_P1-day5-rep1
## Guo-AAACCTGAGTTGAGTA-2 Guo et al._Nature Communication Guo_P1-day1-rep2
##                               disease_original                 disease_general
## Guo-AAACCTGAGAGCTTCT-2         COVID-19 Severe COVID-19 Severe/Late stage/Vent
## Guo-AAACCTGAGAGGTTGC-7 COVID-19 Mild/Remission              COVID-19 Remission
## Guo-AAACCTGAGATACACA-3 COVID-19 Mild/Remission              COVID-19 Remission
## Guo-AAACCTGAGCGATTCT-1         COVID-19 Severe COVID-19 Severe/Late stage/Vent
## Guo-AAACCTGAGTGAAGAG-3 COVID-19 Mild/Remission              COVID-19 Remission
## Guo-AAACCTGAGTTGAGTA-2         COVID-19 Severe COVID-19 Severe/Late stage/Vent
##                        COVID-19 Condition       Lineage     Cell.group
## Guo-AAACCTGAGAGCTTCT-2             severe       Myeloid CD14+ Monocyte
## Guo-AAACCTGAGAGGTTGC-7          remission Lymphoid_T/NK    CD8+ T cell
## Guo-AAACCTGAGATACACA-3          remission Lymphoid_T/NK    CD4+ T cell
## Guo-AAACCTGAGCGATTCT-1             severe Lymphoid_T/NK    CD8+ T cell
## Guo-AAACCTGAGTGAAGAG-3          remission Lymphoid_T/NK    CD4+ T cell
## Guo-AAACCTGAGTTGAGTA-2             severe    Lymphoid_B         B cell
##                        Cell.class_reannotated nFeaturess_RNA nCounts_RNA
## Guo-AAACCTGAGAGCTTCT-2     Classical Monocyte           1410   12006.946
## Guo-AAACCTGAGAGGTTGC-7               CD8+ Tem            904    8439.710
## Guo-AAACCTGAGATACACA-3               CD4+ Tcm            877    8103.279
## Guo-AAACCTGAGCGATTCT-1               CD8+ Tem            933    8748.600
## Guo-AAACCTGAGTGAAGAG-3           CD4+ T naive            468    4836.542
## Guo-AAACCTGAGTTGAGTA-2                B naive            744    7164.143
##                        percent_mito tissue_original tissue_ontology_term_id
## Guo-AAACCTGAGAGCTTCT-2   0.01152303            PBMC          UBERON:0000178
## Guo-AAACCTGAGAGGTTGC-7   0.01349096            PBMC          UBERON:0000178
## Guo-AAACCTGAGATACACA-3   0.01543431            PBMC          UBERON:0000178
## Guo-AAACCTGAGCGATTCT-1   0.01182754            PBMC          UBERON:0000178
## Guo-AAACCTGAGTGAAGAG-3   0.02864259            PBMC          UBERON:0000178
## Guo-AAACCTGAGTTGAGTA-2   0.01400229            PBMC          UBERON:0000178
##                        disease_ontology_term_id donor_id
## Guo-AAACCTGAGAGCTTCT-2            MONDO:0100096       P1
## Guo-AAACCTGAGAGGTTGC-7            MONDO:0100096       P2
## Guo-AAACCTGAGATACACA-3            MONDO:0100096       P1
## Guo-AAACCTGAGCGATTCT-1            MONDO:0100096       P1
## Guo-AAACCTGAGTGAAGAG-3            MONDO:0100096       P1
## Guo-AAACCTGAGTTGAGTA-2            MONDO:0100096       P1
##                        development_stage_ontology_term_id
## Guo-AAACCTGAGAGCTTCT-2                     HsapDv:0000133
## Guo-AAACCTGAGAGGTTGC-7                     HsapDv:0000172
## Guo-AAACCTGAGATACACA-3                     HsapDv:0000133
## Guo-AAACCTGAGCGATTCT-1                     HsapDv:0000133
## Guo-AAACCTGAGTGAAGAG-3                     HsapDv:0000133
## Guo-AAACCTGAGTTGAGTA-2                     HsapDv:0000133
##                        assay_ontology_term_id cell_type_ontology_term_id
## Guo-AAACCTGAGAGCTTCT-2            EFO:0009899                 CL:0000860
## Guo-AAACCTGAGAGGTTGC-7            EFO:0009899                 CL:0000913
## Guo-AAACCTGAGATACACA-3            EFO:0009899                 CL:0000904
## Guo-AAACCTGAGCGATTCT-1            EFO:0009899                 CL:0000913
## Guo-AAACCTGAGTGAAGAG-3            EFO:0009899                 CL:0000895
## Guo-AAACCTGAGTTGAGTA-2            EFO:0009899                 CL:0000788
##                        self_reported_ethnicity_ontology_term_id
## Guo-AAACCTGAGAGCTTCT-2                                  unknown
## Guo-AAACCTGAGAGGTTGC-7                                  unknown
## Guo-AAACCTGAGATACACA-3                                  unknown
## Guo-AAACCTGAGCGATTCT-1                                  unknown
## Guo-AAACCTGAGTGAAGAG-3                                  unknown
## Guo-AAACCTGAGTTGAGTA-2                                  unknown
##                        sex_ontology_term_id is_primary_data
## Guo-AAACCTGAGAGCTTCT-2         PATO:0000384           FALSE
## Guo-AAACCTGAGAGGTTGC-7         PATO:0000384           FALSE
## Guo-AAACCTGAGATACACA-3         PATO:0000384           FALSE
## Guo-AAACCTGAGCGATTCT-1         PATO:0000384           FALSE
## Guo-AAACCTGAGTGAAGAG-3         PATO:0000384           FALSE
## Guo-AAACCTGAGTTGAGTA-2         PATO:0000384           FALSE
##                        organism_ontology_term_id suspension_type tissue_type
## Guo-AAACCTGAGAGCTTCT-2            NCBITaxon:9606            cell      tissue
## Guo-AAACCTGAGAGGTTGC-7            NCBITaxon:9606            cell      tissue
## Guo-AAACCTGAGATACACA-3            NCBITaxon:9606            cell      tissue
## Guo-AAACCTGAGCGATTCT-1            NCBITaxon:9606            cell      tissue
## Guo-AAACCTGAGTGAAGAG-3            NCBITaxon:9606            cell      tissue
## Guo-AAACCTGAGTTGAGTA-2            NCBITaxon:9606            cell      tissue
##                                                                   cell_type
## Guo-AAACCTGAGAGCTTCT-2                                   classical monocyte
## Guo-AAACCTGAGAGGTTGC-7      effector memory CD8-positive, alpha-beta T cell
## Guo-AAACCTGAGATACACA-3       central memory CD4-positive, alpha-beta T cell
## Guo-AAACCTGAGCGATTCT-1      effector memory CD8-positive, alpha-beta T cell
## Guo-AAACCTGAGTGAAGAG-3 naive thymus-derived CD4-positive, alpha-beta T cell
## Guo-AAACCTGAGTTGAGTA-2                                         naive B cell
##                            assay  disease     organism  sex tissue
## Guo-AAACCTGAGAGCTTCT-2 10x 3' v2 COVID-19 Homo sapiens male  blood
## Guo-AAACCTGAGAGGTTGC-7 10x 3' v2 COVID-19 Homo sapiens male  blood
## Guo-AAACCTGAGATACACA-3 10x 3' v2 COVID-19 Homo sapiens male  blood
## Guo-AAACCTGAGCGATTCT-1 10x 3' v2 COVID-19 Homo sapiens male  blood
## Guo-AAACCTGAGTGAAGAG-3 10x 3' v2 COVID-19 Homo sapiens male  blood
## Guo-AAACCTGAGTTGAGTA-2 10x 3' v2 COVID-19 Homo sapiens male  blood
##                        self_reported_ethnicity       development_stage
## Guo-AAACCTGAGAGCTTCT-2                 unknown 39-year-old human stage
## Guo-AAACCTGAGAGGTTGC-7                 unknown 78-year-old human stage
## Guo-AAACCTGAGATACACA-3                 unknown 39-year-old human stage
## Guo-AAACCTGAGCGATTCT-1                 unknown 39-year-old human stage
## Guo-AAACCTGAGTGAAGAG-3                 unknown 39-year-old human stage
## Guo-AAACCTGAGTTGAGTA-2                 unknown 39-year-old human stage
##                        observation_joinid
## Guo-AAACCTGAGAGCTTCT-2         2P)e%zgsv_
## Guo-AAACCTGAGAGGTTGC-7         Lv&N1yD6*0
## Guo-AAACCTGAGATACACA-3         DZ>`^5OH2o
## Guo-AAACCTGAGCGATTCT-1         J4$QmqEgvX
## Guo-AAACCTGAGTGAAGAG-3         Y&7u#&E`-T
## Guo-AAACCTGAGTTGAGTA-2         XQ2XgsY|}S

head(ref@meta.data)

##                     celltype.l1    celltype.l2           celltype.l3 ori.index
## L1_AAACGAATCCTCACCA     other T            gdT                 gdT_3        27
## L1_AAACGCTAGAGCATTA       CD8 T        CD8 TEM             CD8 TEM_2        30
## L1_AAACGCTCAACGATCT       CD8 T        CD8 TCM             CD8 TCM_1        35
## L1_AAACGCTGTGCTCGTG     other T            dnT                 dnT_2        40
## L1_AAACGCTTCTTGGTCC           B B intermediate B intermediate lambda        42
## L1_AAAGAACCAAGCGGAT       CD4 T        CD4 TCM             CD4 TCM_3        46
##                     nCount_refAssay nFeature_refAssay
## L1_AAACGAATCCTCACCA               0                 0
## L1_AAACGCTAGAGCATTA               0                 0
## L1_AAACGCTCAACGATCT               0                 0
## L1_AAACGCTGTGCTCGTG               0                 0
## L1_AAACGCTTCTTGGTCC               0                 0
## L1_AAAGAACCAAGCGGAT               0                 0

# Check how different cell types are in the reference 
table(ref$celltype.l1)

## 
##       B   CD4 T   CD8 T      DC    Mono      NK   other other T 
##    2698   17646    5858     922    4564    2051    1078    1616

table(ref$celltype.l2)

## 
##              ASDC    B intermediate          B memory           B naive 
##                16               899               597               903 
##         CD14 Mono         CD16 Mono           CD4 CTL         CD4 Naive 
##              3673               891               296              1403 
## CD4 Proliferating           CD4 TCM           CD4 TEM         CD8 Naive 
##                57             14889               703              1148 
## CD8 Proliferating           CD8 TCM           CD8 TEM              cDC1 
##                31              2883              1796               150 
##              cDC2               dnT             Eryth               gdT 
##               472               181                81               835 
##              HSPC               ILC              MAIT                NK 
##               300               131               600              1546 
##  NK Proliferating     NK_CD56bright               pDC       Plasmablast 
##               214               291               284               299 
##          Platelet              Treg 
##               566               298

# Check no. of genes 
nrow(query)

## [1] 17374

nrow(ref)

## [1] 5000

# Check no. of cells 
ncol(query)

## [1] 14783

ncol(ref)

## [1] 36433

(2) DimRed viz

(10 min)

AIM: Visualize data in the low dimensional space highlighting the different categorical variables of interest.

Check the metadata of query and reference objects and choose the most interesting categorical variables to highlight into UMAP.

## Dimensional reduction - visualization

## Reference 
colnames(ref@meta.data)

## [1] "celltype.l1"       "celltype.l2"       "celltype.l3"      
## [4] "ori.index"         "nCount_refAssay"   "nFeature_refAssay"

ref.vars <- c("celltype.l1", "celltype.l2")
ref.umap.plts <- lapply(X = ref.vars, function(x) {
  DimPlot(object = ref, reduction = "refUMAP", group.by = x, pt.size = 0.1, label = TRUE)
})

## Query
colnames(query@meta.data)[4] <- "disease_stage"
colnames(query@meta.data)

##  [1] "orig.ident"                              
##  [2] "nCount_RNA"                              
##  [3] "nFeature_RNA"                            
##  [4] "disease_stage"                           
##  [5] "treatment"                               
##  [6] "timepoint"                               
##  [7] "Dataset"                                 
##  [8] "sample"                                  
##  [9] "disease_original"                        
## [10] "disease_general"                         
## [11] "COVID-19 Condition"                      
## [12] "Lineage"                                 
## [13] "Cell.group"                              
## [14] "Cell.class_reannotated"                  
## [15] "nFeaturess_RNA"                          
## [16] "nCounts_RNA"                             
## [17] "percent_mito"                            
## [18] "tissue_original"                         
## [19] "tissue_ontology_term_id"                 
## [20] "disease_ontology_term_id"                
## [21] "donor_id"                                
## [22] "development_stage_ontology_term_id"      
## [23] "assay_ontology_term_id"                  
## [24] "cell_type_ontology_term_id"              
## [25] "self_reported_ethnicity_ontology_term_id"
## [26] "sex_ontology_term_id"                    
## [27] "is_primary_data"                         
## [28] "organism_ontology_term_id"               
## [29] "suspension_type"                         
## [30] "tissue_type"                             
## [31] "cell_type"                               
## [32] "assay"                                   
## [33] "disease"                                 
## [34] "organism"                                
## [35] "sex"                                     
## [36] "tissue"                                  
## [37] "self_reported_ethnicity"                 
## [38] "development_stage"                       
## [39] "observation_joinid"

query.vars <- c("disease_stage", "donor_id", "timepoint", "Cell.group", "Cell.class_reannotated")
query.umap.plts <- lapply(X = query.vars, function(x) {
  DimPlot(object = query, reduction = "umap", group.by = x, pt.size = 0.1, label = TRUE)
})

Plot the categorical variables celltype.l1 and celltype.l2 for the reference below.

## Plot dimensional reductions for reference
(ref.umap.plts[[1]] + NoLegend()) + (ref.umap.plts[[2]] + NoLegend())

Plot the categorical variables disease_stage, donor_id, timepoint, Cell.group, Cell.class_reannotated for the query below.

## Plot dimensional reductions for reference
(query.umap.plts[[1]]) + (query.umap.plts[[2]]) + (query.umap.plts[[3]])

## Plot dimensional reductions for reference
(query.umap.plts[[4]] + NoLegend()) + (query.umap.plts[[5]] + NoLegend())

(3) Reference-mapping

(10 min)

AIM: Annotate and project the query against the reference data set.

Perform reference-mapping below by identifying anchors between the COVID versus healthy PBMCs data sets and transferring the labels of interest (celltype.l1, celltype.l2) from the previously annotated healthy data set into the query COVID data set with the functions FindTransferAnchors() and MapQuery(). Alternatively you can run the same procedure using the Azimuth function RunAzimuth.

## Reference-mapping

# Set to TRUE in case you wanna run with 'Azimuth'
run.azimuth <- FALSE

if (run.azimuth) {
  query <- Azimuth::RunAzimuth(query, reference = "pbmcref")
} else {
  # Find anchors
  anchors <- FindTransferAnchors(
    reference = ref,
    query = query, 
    features = rownames(Loadings(ref[["refDR"]])),
    reference.reduction = "refDR",
    normalization.method = "SCT",
    dims = 1:50
  )
  # Transfer labels
  query <- MapQuery(
    anchorset = anchors,
    query = query,
    reference = ref,
    refdata = list(
      celltype.l1 = "celltype.l1",
      celltype.l2 = "celltype.l2"
    ),
    reduction.model = "refUMAP"
  )
  ## EXCEPTION (to project query onto reference UMAP): due to the Azimuth reference 'pbmcref'
  query <- Azimuth:::NNTransform(object = query, meta.data = ref[[]]) # from 'RunAzimuth()'
  query[["ref.umap"]] <- RunUMAP(object = query[["query_ref.nn"]], reduction.model = ref[["refUMAP"]], 
                                  reduction.key = "UMAP_", verbose = TRUE)
}

# Save query R object
saveRDS(object = query, file = file.path(res.dir[3], "query_refmap.rds"))

(4) Assess reference-mapping

(15 min)

AIM: Assess the accuracy of the reference-mapping task.

Run the R chunk code below to compare the predicted cell type labels for the COVID data set against the healthy human PBMCs with the ground-truth cell labels.

## Assessment of reference-mapping accuracy

# Plot UMAPs
refmap.plts <- list()
refmap.plts[[1]] <- DimPlot(ref, reduction = "refUMAP", group.by =  "celltype.l1", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) + NoLegend()
refmap.plts[[2]] <- DimPlot(ref, reduction = "refUMAP", group.by =  "celltype.l2", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) + NoLegend()
refmap.plts[[3]] <- DimPlot(query, reduction = "ref.umap", group.by =  "Cell.group", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) + NoLegend()
refmap.plts[[4]] <- DimPlot(query, reduction = "ref.umap", group.by =  "Cell.class_reannotated", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) +
  NoLegend()
refmap.plts[[5]] <- DimPlot(query, reduction = "ref.umap", group.by = "predicted.celltype.l1", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) +
  NoLegend()
refmap.plts[[6]] <- DimPlot(query, reduction = "ref.umap", group.by = "predicted.celltype.l2", 
                            label = TRUE, pt.size = 0.1, alpha = 0.1) +
  NoLegend()
refmap.plts[[7]] <- DimPlot(query, reduction = "ref.umap", group.by =  query.vars[1], 
                            label = TRUE, pt.size = 0.1, alpha = 0.1)
refmap.plts[[8]] <- DimPlot(query, reduction = "ref.umap", group.by = query.vars[2], 
                            label = TRUE, pt.size = 0.1, alpha = 0.1)
refmap.plts[[9]] <- DimPlot(query, reduction = "ref.umap", group.by = query.vars[3], 
                            label = TRUE, pt.size = 0.1, alpha = 0.1)

# Confusion matrices
celltype1xgroup <- table(query$predicted.celltype.l1, query$Cell.group)
celltype1xgroup <- celltype1xgroup %>% as.matrix.data.frame(.) %>% 
  `colnames<-`(colnames(celltype1xgroup)) %>% `row.names<-`(row.names(celltype1xgroup))
celltype1xclass <- table(query$predicted.celltype.l1, query$Cell.class_reannotated)
celltype1xclass <- celltype1xclass %>% as.matrix.data.frame(.) %>% 
  `colnames<-`(colnames(celltype1xclass)) %>% `row.names<-`(row.names(celltype1xclass))
celltype2xgroup <- table(query$predicted.celltype.l2, query$Cell.group)
celltype2xgroup <- celltype2xgroup %>% as.matrix.data.frame(.) %>% 
  `colnames<-`(colnames(celltype2xgroup)) %>% `row.names<-`(row.names(celltype2xgroup))
celltype2xclass <- table(query$predicted.celltype.l2, query$Cell.class_reannotated)
celltype2xclass <- celltype2xclass %>% as.matrix.data.frame(.) %>% 
  `colnames<-`(colnames(celltype2xclass)) %>% `row.names<-`(row.names(celltype2xclass))

Plot below the heatmaps of the confusion matrices between predicted cell type labels (celltype.l1, celltype.l2) versus ground-truth cell type labels (Cell.group, class_reannotated).

tbls <- list("celltype1xCell.group" = celltype1xgroup, "celltype1xclass_reannotated" = celltype1xclass, 
             "celltype2xCell.group" = celltype2xgroup, "celltype2xclass_reannotated" = celltype2xclass)
heat.list <- lapply(X = setNames(names(tbls), names(tbls)), FUN = function(comp) {
  Heatmap(matrix = t(apply(tbls[[comp]], 1, function(x) x/sum(x)*100)), name = "% of cells", 
          cluster_rows = F, cluster_columns = F, row_names_side = "left",
          show_column_names = T, show_row_names = TRUE,
          col = circlize::colorRamp2(c(0, 50, 100), c("white", "red1", "red4")), 
          column_names_side = "top", column_names_rot = 45, 
          layer_fun = function(j, i, x, y, width, height, fill, slice_r, slice_c) {
            v = pindex(tbls[[comp]], i, j)
            grid.text(sprintf("%.0f", v), x, y, gp = gpar(fontsize = 10))
            if(slice_r != slice_c) {
              grid.rect(gp = gpar(lwd = 2, fill = "transparent"))
            }
          }, 
          column_title = gsub("x", " vs ", comp), 
          rect_gp = gpar(col = "white", lwd = 2)
  )
})

# Print below
heat.list$celltype1xCell.group + heat.list$celltype1xclass_reannotated

heat.list$celltype2xCell.group + heat.list$celltype2xclass_reannotated

Project the data below onto the reference UMAP highlighting the predicted (celltype.l1, celltype.l2) and ground-truth cell type labels (Cell.group, class_reannotated).

refmap.plts[[1]] + refmap.plts[[2]]

refmap.plts[[3]] + refmap.plts[[4]]

refmap.plts[[5]] + refmap.plts[[6]]

Plot the categorical variables disease_stage, donor_id, timepoint for the integrated query below.

## Plot dimensional reductions for reference
refmap.plts[[7]] + refmap.plts[[8]] + refmap.plts[[9]]

R packages used and respective versions

## R packages and versions used in these analyses
sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] ComplexHeatmap_2.15.4 patchwork_1.2.0       Azimuth_0.5.0        
## [4] shinyBS_0.61.1        Seurat_5.1.0          SeuratObject_5.0.2   
## [7] sp_2.1-4              dplyr_1.1.4          
## 
## loaded via a namespace (and not attached):
##   [1] rappdirs_0.3.3                    rtracklayer_1.54.0               
##   [3] scattermore_1.2                   R.methodsS3_1.8.2                
##   [5] tidyr_1.3.1                       JASPAR2020_0.99.10               
##   [7] ggplot2_3.5.1                     bit64_4.0.5                      
##   [9] knitr_1.47                        irlba_2.3.5.1                    
##  [11] DelayedArray_0.20.0               R.utils_2.12.3                   
##  [13] data.table_1.15.4                 doParallel_1.0.17                
##  [15] KEGGREST_1.34.0                   TFBSTools_1.32.0                 
##  [17] RCurl_1.98-1.14                   AnnotationFilter_1.18.0          
##  [19] generics_0.1.3                    BiocGenerics_0.40.0              
##  [21] GenomicFeatures_1.46.5            cowplot_1.1.3                    
##  [23] RSQLite_2.3.7                     RANN_2.6.1                       
##  [25] future_1.33.2                     bit_4.0.5                        
##  [27] tzdb_0.4.0                        spatstat.data_3.1-2              
##  [29] xml2_1.3.6                        httpuv_1.6.15                    
##  [31] SummarizedExperiment_1.24.0       assertthat_0.2.1                 
##  [33] DirichletMultinomial_1.36.0       gargle_1.5.2                     
##  [35] xfun_0.45                         hms_1.1.3                        
##  [37] jquerylib_0.1.4                   evaluate_0.24.0                  
##  [39] promises_1.3.0                    fansi_1.0.6                      
##  [41] restfulr_0.0.15                   progress_1.2.3                   
##  [43] caTools_1.18.2                    dbplyr_2.5.0                     
##  [45] igraph_2.0.3                      DBI_1.2.3                        
##  [47] htmlwidgets_1.6.4                 spatstat.geom_3.2-9              
##  [49] googledrive_2.1.1                 stats4_4.1.0                     
##  [51] purrr_1.0.2                       RSpectra_0.16-1                  
##  [53] annotate_1.72.0                   biomaRt_2.50.3                   
##  [55] deldir_2.0-4                      MatrixGenerics_1.6.0             
##  [57] vctrs_0.6.5                       Biobase_2.54.0                   
##  [59] SeuratDisk_0.0.0.9021             ensembldb_2.18.4                 
##  [61] ROCR_1.0-11                       abind_1.4-5                      
##  [63] cachem_1.1.0                      withr_3.0.0                      
##  [65] BSgenome.Hsapiens.UCSC.hg38_1.4.4 BSgenome_1.62.0                  
##  [67] progressr_0.14.0                  presto_1.0.0                     
##  [69] sctransform_0.4.1                 GenomicAlignments_1.30.0         
##  [71] prettyunits_1.2.0                 goftest_1.2-3                    
##  [73] cluster_2.1.2                     dotCall64_1.1-1                  
##  [75] lazyeval_0.2.2                    seqLogo_1.60.0                   
##  [77] crayon_1.5.3                      hdf5r_1.3.10                     
##  [79] spatstat.explore_3.2-7            labeling_0.4.3                   
##  [81] pkgconfig_2.0.3                   GenomeInfoDb_1.30.1              
##  [83] nlme_3.1-152                      ProtGenerics_1.26.0              
##  [85] rlang_1.1.4                       globals_0.16.3                   
##  [87] lifecycle_1.0.4                   miniUI_0.1.1.1                   
##  [89] filelock_1.0.3                    fastDummies_1.7.3                
##  [91] klippy_0.0.0.9500                 BiocFileCache_2.2.1              
##  [93] SeuratData_0.2.2.9001             cellranger_1.1.0                 
##  [95] polyclip_1.10-6                   RcppHNSW_0.6.0                   
##  [97] matrixStats_1.1.0                 lmtest_0.9-40                    
##  [99] Matrix_1.6-5                      Rhdf5lib_1.16.0                  
## [101] zoo_1.8-12                        GlobalOptions_0.1.2              
## [103] ggridges_0.5.6                    googlesheets4_1.1.1              
## [105] png_0.1-8                         viridisLite_0.4.2                
## [107] rjson_0.2.21                      shinydashboard_0.7.2             
## [109] bitops_1.0-7                      R.oo_1.26.0                      
## [111] rhdf5filters_1.6.0                KernSmooth_2.23-20               
## [113] spam_2.10-0                       Biostrings_2.62.0                
## [115] blob_1.2.4                        shape_1.4.6.1                    
## [117] stringr_1.5.1                     parallelly_1.37.1                
## [119] spatstat.random_3.2-3             readr_2.1.5                      
## [121] S4Vectors_0.32.4                  CNEr_1.30.0                      
## [123] scales_1.3.0                      memoise_2.0.1                    
## [125] magrittr_2.0.3                    plyr_1.8.9                       
## [127] ica_1.0-3                         zlibbioc_1.40.0                  
## [129] compiler_4.1.0                    BiocIO_1.4.0                     
## [131] RColorBrewer_1.1-3                clue_0.3-65                      
## [133] fitdistrplus_1.1-11               Rsamtools_2.10.0                 
## [135] cli_3.6.3                         XVector_0.34.0                   
## [137] listenv_0.9.1                     pbapply_1.7-2                    
## [139] MASS_7.3-54                       tidyselect_1.2.1                 
## [141] stringi_1.8.4                     highr_0.11                       
## [143] yaml_2.3.8                        ggrepel_0.9.5                    
## [145] sass_0.4.9                        fastmatch_1.1-4                  
## [147] EnsDb.Hsapiens.v86_2.99.0         tools_4.1.0                      
## [149] future.apply_1.11.2               parallel_4.1.0                   
## [151] circlize_0.4.16                   rstudioapi_0.13                  
## [153] TFMPvalue_0.0.9                   foreach_1.5.2                    
## [155] gridExtra_2.3                     farver_2.1.2                     
## [157] Rtsne_0.17                        digest_0.6.36                    
## [159] shiny_1.8.1.1                     pracma_2.4.4                     
## [161] Rcpp_1.0.12                       GenomicRanges_1.46.1             
## [163] later_1.3.2                       RcppAnnoy_0.0.22                 
## [165] httr_1.4.7                        AnnotationDbi_1.56.2             
## [167] colorspace_2.1-0                  XML_3.99-0.17                    
## [169] fs_1.6.4                          tensor_1.5                       
## [171] reticulate_1.38.0                 IRanges_2.28.0                   
## [173] splines_4.1.0                     uwot_0.2.2                       
## [175] RcppRoll_0.3.0                    spatstat.utils_3.0-5             
## [177] plotly_4.10.4                     xtable_1.8-4                     
## [179] jsonlite_1.8.8                    poweRlaw_0.80.0                  
## [181] R6_2.5.1                          pillar_1.9.0                     
## [183] htmltools_0.5.8.1                 mime_0.12                        
## [185] glue_1.7.0                        fastmap_1.2.0                    
## [187] DT_0.33                           BiocParallel_1.28.3              
## [189] codetools_0.2-18                  Signac_1.13.0                    
## [191] utf8_1.2.4                        lattice_0.20-44                  
## [193] bslib_0.7.0                       spatstat.sparse_3.1-0            
## [195] tibble_3.2.1                      curl_5.2.1                       
## [197] leiden_0.4.3.1                    gtools_3.9.5                     
## [199] shinyjs_2.1.0                     GO.db_3.14.0                     
## [201] survival_3.2-11                   rmarkdown_2.27                   
## [203] munsell_0.5.1                     GetoptLong_1.0.5                 
## [205] rhdf5_2.38.1                      GenomeInfoDbData_1.2.7           
## [207] iterators_1.0.14                  reshape2_1.4.4                   
## [209] gtable_0.3.5

COVID reference-mapping task

António Sousa (e-mail: aggode@utu.fi) - Elo lab (https://elolab.utu.fi)

03/07/2024