# get immune cell profiles included in the InSituType library:
library(InSituType)
data("ioprofiles")
head(ioprofiles)
# load HCA profiles for colon:
load(url("https://github.com/Nanostring-Biostats/CellProfileLibrary/raw/master/Human/Adult/Colon_HCA.RData"))
<- as.matrix(profile_matrix)
colonprofiles head(colonprofiles)
# align genes:
<- intersect(rownames(ioprofiles), rownames(colonprofiles))
sharedgenes <- ioprofiles[sharedgenes, ]
ioprofiles <- colonprofiles[sharedgenes, ]
colonprofiles
# put on appproximately the same scale:
<- ioprofiles / quantile(ioprofiles, 0.99) * 1000
ioprofiles <- colonprofiles / quantile(colonprofiles, 0.99) * 1000
colonprofiles
# omit immune cells from colon profiles:
colnames(ioprofiles)
colnames(colonprofiles)
<- c("plasma.cell")
omit_from_colon <- c("fibroblast", "endothelial")
omit_from_io
# merge:
<- cbind(ioprofiles[, setdiff(colnames(ioprofiles), omit_from_io)],
ref setdiff(colnames(colonprofiles), omit_from_colon)])
colonprofiles[, head(ref)
Background
The InSituType cell typing algorithm relies on a “reference matrix” to perform supervised or semi-supervised cell typing. A reference matrix gives the expected gene expression profile of each cell type in a tissue; these are usually derived from previous scRNA-seq or spatial transcriptomics experiments.
Deriving reference profiles:
To create reference profiles from a previous dataset, you can use InSituType::getRNAprofiles
. When using this function, keep the following in mind:
- Ensure that your dataset is linear-scale (no log-transformations)
- Raw data is preferred, though normalized will work just fine
- If you input data is scRNA-seq, which has essentially no background, just enter the negative control argument as
neg = rep(0, length(clust))
(whereclust
is the vector you pass to the clust argument).
Creating hybrid reference profiles:
Fairly often, it’s convenient to create a hybrid reference matrix from two studies. In a typical example, you may want to cell type a solid tissue with autoimmune disease. There is likely a good scRNA-seq dataset available for the healthy cell types in your tissue, but this dataset probably has poor coverage of immune cells, as immune cells are rare in non-inflamed tissues, and most single cell datasets only sample some tens of thousands of cells.
Example
Here’s an example workflow for merging the HCA colon cell profiles (Kinchen et al 2018, original data here) with the InSituType immune cell profiles. The key steps are:
- Aligning by shared genes
- Rescaling (InSituType doesn’t care about the scaling of the columns in a reference matrix, but it’s convenient for descriptive analyses if they’re all comparably scaled.)
- Removing redundant cell types
Once you obtain this merged reference profile, you can use it for cell typing with InSituType as you would any other reference matrix.
For deep-dives on other cell typing topics, see the InSituType FAQS.