runSegRefinement — runSegRefinement • FastReseg

modular wrapper to evalute transcript groups in neighborhood, decide resegmentation operations and execute

Usage

runSegRefinement(
  score_GeneMatrix,
  chosen_cells = NULL,
  reseg_transcript_df,
  reseg_cellID_coln = "tmp_cellID",
  reseg_celltype_coln = "group_maxCellType",
  transID_coln = "transcript_id",
  transGene_coln = "target",
  transSpatLocs_coln = c("x", "y", "z"),
  score_baseline = NULL,
  lowerCutoff_transNum = NULL,
  higherCutoff_transNum = NULL,
  neighbor_distance_xy = NULL,
  distance_cutoff = 2.7,
  spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
  cutoff_spatialMerge = 0.5,
  leiden_config = list(objective_function = c("CPM", "modularity"), resolution_parameter
    = 1, beta = 0.01, n_iterations = 200),
  config_spatNW_transcript = NULL,
  return_intermediates = TRUE,
  return_perCellData = TRUE,
  includeAllRefGenes = FALSE,
  seed_segRefine = NULL
)

Arguments

score_GeneMatrix: the gene x cell-type matrix of log-like score of gene in each cell type
chosen_cells: the cell_ID of chosen cells need to be evaluate for re-segmentation
reseg_transcript_df: the data.frame with transcript_id, target/geneName, x, y, cell_id for all transcript groups and the cell type of maximum transcript scores for each transcript group
reseg_cellID_coln: the column name of cell_ID for all transcript groups in transcript_df
reseg_celltype_coln: the column name of cell_type for all transcript groups in transcript_df
transID_coln: the column name of transcript_ID in transcript_df
transGene_coln: the column name of target or gene name in transcript_df
transSpatLocs_coln: the column name of 1st, 2nd, optional 3rd spatial dimension of each transcript in transcript_df
score_baseline: a named vector of score baseline for all cell type listed in neighborhood_df such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence
lowerCutoff_transNum: a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is
higherCutoff_transNum: a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.
neighbor_distance_xy: maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate. Default = NULL to use the 2 times of average 2D cell diameter.
distance_cutoff: maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate (default = 2.7 micron). If set to NULL, the pipeline would first randomly choose no more than 2500 cells from up to 10 random picked ROIs with search radius to be 5 times of neighbor_distance_xy, and then calculate the minimal molecular distance between picked cells. The pipeline would further use the 5 times of 90% quantile of minimal molecular distance as distance_cutoff. This calculation is slow and is not recommended for large transcript data.frame.
spatialMergeCheck_method: use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")
cutoff_spatialMerge: spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For spatialMergeCheck_method = "leidenCut", this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. For spatialMergeCheck_method = "geometryDiff", this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.
leiden_config: (leidenCut) a list of configuration to pass to reticulate and igraph::cluster_leiden function, including objective_function, resolution_parameter, beta, n_iterations.
config_spatNW_transcript: (leidenCut) configuration list to create spatial network at transcript level, see manual for createSpatialDelaunayNW_from_spatLocs for more details, set to NULL to use default config
return_intermediates: flag to return intermediate outputs, including neighborhoodDF_ToReseg data.frame for neighborhood evaluation, reseg_actions list of resegmentation actions
return_perCellData: flag to return gene x cell count matrix and per cell DF with updated mean spatial coordinates and new cell type
includeAllRefGenes: flag to include all genes in score_GeneMatrix in the returned updated_perCellExprs with missing genes of value 0 (default = FALSE)
seed_segRefine: seed for transcript error correction step, default = NULL to skip the seed

Value

a list

updated_transDF: the updated transcript_df with updated_cellID and updated_celltype columns based on reseg_full_converter
neighborhoodDF_ToReseg: a data.frame for neighborhood environment of low-score transcript groups, output of get_neighborhood_content function, return when return_intermediates = TRUE
reseg_actions: a list of 4 elements describing how the resegmenation would be performed on original transcript_df by the group assignment of transcripts listed in groupDF_ToFlagTrans, output of decide_ReSegment_Operations function, return when return_intermediates = TRUE
updated_perCellDT: a per cell data.table with mean spatial coordinates, new cell type and resegmentation action after resegmentation, return when return_perCellData = TRUE
updated_perCellExprs: a gene x cell count sparse matrix for updated transcript data.frame after resegmentation, return when return_perCellData = TRUE