Skip to contents

modular wrapper to evalute transcript groups in neighborhood, decide resegmentation operations and execute

Usage

runSegRefinement(
  score_GeneMatrix,
  chosen_cells = NULL,
  reseg_transcript_df,
  reseg_cellID_coln = "tmp_cellID",
  reseg_celltype_coln = "group_maxCellType",
  transID_coln = "transcript_id",
  transGene_coln = "target",
  transSpatLocs_coln = c("x", "y", "z"),
  score_baseline = NULL,
  lowerCutoff_transNum = NULL,
  higherCutoff_transNum = NULL,
  neighbor_distance_xy = NULL,
  distance_cutoff = 2.7,
  spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
  cutoff_spatialMerge = 0.5,
  leiden_config = list(objective_function = c("CPM", "modularity"), resolution_parameter
    = 1, beta = 0.01, n_iterations = 200),
  config_spatNW_transcript = NULL,
  return_intermediates = TRUE,
  return_perCellData = TRUE,
  includeAllRefGenes = FALSE,
  seed_segRefine = NULL
)

Arguments

score_GeneMatrix

the gene x cell-type matrix of log-like score of gene in each cell type

chosen_cells

the cell_ID of chosen cells need to be evaluate for re-segmentation

reseg_transcript_df

the data.frame with transcript_id, target/geneName, x, y, cell_id for all transcript groups and the cell type of maximum transcript scores for each transcript group

reseg_cellID_coln

the column name of cell_ID for all transcript groups in transcript_df

reseg_celltype_coln

the column name of cell_type for all transcript groups in transcript_df

transID_coln

the column name of transcript_ID in transcript_df

transGene_coln

the column name of target or gene name in transcript_df

transSpatLocs_coln

the column name of 1st, 2nd, optional 3rd spatial dimension of each transcript in transcript_df

score_baseline

a named vector of score baseline for all cell type listed in neighborhood_df such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence

lowerCutoff_transNum

a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is

higherCutoff_transNum

a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.

neighbor_distance_xy

maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate. Default = NULL to use the 2 times of average 2D cell diameter.

distance_cutoff

maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate (default = 2.7 micron). If set to NULL, the pipeline would first randomly choose no more than 2500 cells from up to 10 random picked ROIs with search radius to be 5 times of neighbor_distance_xy, and then calculate the minimal molecular distance between picked cells. The pipeline would further use the 5 times of 90% quantile of minimal molecular distance as distance_cutoff. This calculation is slow and is not recommended for large transcript data.frame.

spatialMergeCheck_method

use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")

cutoff_spatialMerge

spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For spatialMergeCheck_method = "leidenCut", this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. For spatialMergeCheck_method = "geometryDiff", this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.

leiden_config

(leidenCut) a list of configuration to pass to reticulate and igraph::cluster_leiden function, including objective_function, resolution_parameter, beta, n_iterations.

config_spatNW_transcript

(leidenCut) configuration list to create spatial network at transcript level, see manual for createSpatialDelaunayNW_from_spatLocs for more details, set to NULL to use default config

return_intermediates

flag to return intermediate outputs, including neighborhoodDF_ToReseg data.frame for neighborhood evaluation, reseg_actions list of resegmentation actions

return_perCellData

flag to return gene x cell count matrix and per cell DF with updated mean spatial coordinates and new cell type

includeAllRefGenes

flag to include all genes in score_GeneMatrix in the returned updated_perCellExprs with missing genes of value 0 (default = FALSE)

seed_segRefine

seed for transcript error correction step, default = NULL to skip the seed

Value

a list

updated_transDF

the updated transcript_df with updated_cellID and updated_celltype columns based on reseg_full_converter

neighborhoodDF_ToReseg

a data.frame for neighborhood environment of low-score transcript groups, output of get_neighborhood_content function, return when return_intermediates = TRUE

reseg_actions

a list of 4 elements describing how the resegmenation would be performed on original transcript_df by the group assignment of transcripts listed in groupDF_ToFlagTrans, output of decide_ReSegment_Operations function, return when return_intermediates = TRUE

updated_perCellDT

a per cell data.table with mean spatial coordinates, new cell type and resegmentation action after resegmentation, return when return_perCellData = TRUE

updated_perCellExprs

a gene x cell count sparse matrix for updated transcript data.frame after resegmentation, return when return_perCellData = TRUE