Creating reference profiles for InSituType

A short guide to creating reference profiles, and to merging profiles from different sources.
cell typing
Author
Affiliations

Patrick Danaher

NanoString, a Bruker Company

Github: patrickjdanaher

Published

August 21, 2024

Modified

August 21, 2024

Background

The InSituType cell typing algorithm relies on a “reference matrix” to perform supervised or semi-supervised cell typing. A reference matrix gives the expected gene expression profile of each cell type in a tissue; these are usually derived from previous scRNA-seq or spatial transcriptomics experiments.

Deriving reference profiles:

To create reference profiles from a previous dataset, you can use InSituType::getRNAprofiles. When using this function, keep the following in mind:

  1. Ensure that your dataset is linear-scale (no log-transformations)
  2. Raw data is preferred, though normalized will work just fine
  3. If you input data is scRNA-seq, which has essentially no background, just enter the negative control argument as neg = rep(0, length(clust)) (where clust is the vector you pass to the clust argument).

Creating hybrid reference profiles:

Fairly often, it’s convenient to create a hybrid reference matrix from two studies. In a typical example, you may want to cell type a solid tissue with autoimmune disease. There is likely a good scRNA-seq dataset available for the healthy cell types in your tissue, but this dataset probably has poor coverage of immune cells, as immune cells are rare in non-inflamed tissues, and most single cell datasets only sample some tens of thousands of cells.

Example

Here’s an example workflow for merging the HCA colon cell profiles (Kinchen et al 2018, original data here) with the InSituType immune cell profiles. The key steps are:

  • Aligning by shared genes
  • Rescaling (InSituType doesn’t care about the scaling of the columns in a reference matrix, but it’s convenient for descriptive analyses if they’re all comparably scaled.)
  • Removing redundant cell types
# get immune cell profiles included in the InSituType library:
library(InSituType)
data("ioprofiles")
head(ioprofiles)

# load HCA profiles for colon:
load(url("https://github.com/Nanostring-Biostats/CellProfileLibrary/raw/master/Human/Adult/Colon_HCA.RData"))
colonprofiles <- as.matrix(profile_matrix)
head(colonprofiles)

# align genes:
sharedgenes <- intersect(rownames(ioprofiles), rownames(colonprofiles))
ioprofiles <- ioprofiles[sharedgenes, ]
colonprofiles <- colonprofiles[sharedgenes, ]

# put on appproximately the same scale:
ioprofiles <- ioprofiles / quantile(ioprofiles, 0.99) * 1000
colonprofiles <- colonprofiles / quantile(colonprofiles, 0.99) * 1000

# omit immune cells from colon profiles:
colnames(ioprofiles)
colnames(colonprofiles)
omit_from_colon <- c("plasma.cell")
omit_from_io <- c("fibroblast", "endothelial")

# merge:
ref <- cbind(ioprofiles[, setdiff(colnames(ioprofiles), omit_from_io)], 
             colonprofiles[, setdiff(colnames(colonprofiles), omit_from_colon)])
head(ref)

Once you obtain this merged reference profile, you can use it for cell typing with InSituType as you would any other reference matrix.

For deep-dives on other cell typing topics, see the InSituType FAQS.