Skip to contents

modular wrapper to flag cell segmentation error

Usage

runSegErrorEvaluation(
  score_GeneMatrix,
  transcript_df,
  cellID_coln = "UMI_cellID",
  transID_coln = "UMI_transID",
  transGene_coln = "target",
  spatLocs_colns = c("x", "y", "z"),
  flagModel_TransNum_cutoff = 50
)

Arguments

score_GeneMatrix

the gene x cell-type matrix of log-like score of gene in each cell type

transcript_df

the data.frame of transcript_ID, cell_ID, score, spatial coordinates

cellID_coln

the column name of cell_ID in transcript_df

transID_coln

the column name of transcript_ID in transcript_df

transGene_coln

the column name of target or gene name in transcript_df

spatLocs_colns

column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in transcript_df

flagModel_TransNum_cutoff

the cutoff of transcript number to do spatial modeling for identification of wrongly segmented cells (default = 50)

Value

a list of two elements #'

  1. modStats_ToFlagCells, a data.frame contains evaluation model statistics in columns for each cell's potential to have segmentation error

  2. transcript_df, transcript data.frame with 2 additional columns: tLLR_maxCellType for cell types of maxmium transcript score under current segments and score_tLLR_maxCellType for the corresponding transcript score for each transcript

Examples

data("mini_transcriptDF")
data("example_CellGeneExpr")
data("example_refProfiles")
score_GeneMatrix <- scoreGenesInRef(
  genes = intersect(colnames(example_CellGeneExpr), rownames(example_refProfiles)), 
  ref_profiles = pmax(example_refProfiles, 1e-5))

res <- runSegErrorEvaluation(
  score_GeneMatrix= score_GeneMatrix, 
  transcript_df = mini_transcriptDF, 
  cellID_coln = 'UMI_cellID', 
  transID_coln = 'UMI_transID',
  transGene_coln = 'target',
  spatLocs_colns = c('x','y','z'),
  #' cutoff of transcript number to do spatial modeling
  flagModel_TransNum_cutoff = 50) 
#> Found 960 common genes among transcript_df and score_GeneMatrix. 
#> Found 1375 cells and assigned cell type based on the provided `refProfiles` cluster profiles.
#> Run linear regreassion in 3 Dimension.
#> Warning: Below model_cutoff = 50, skip 37 cells with fewer transcripts. Move forward with remaining 1338 cells.