Skip to contents

supporting function for fastReseg_perFOV_full_process, checks and preps inputs for full resegmentation pipeline on the provided transcript_df and calculates distance cutoffs and set values for config_spatNW_transcript and leiden_config if not provided.

Usage

checkAndPrepInputs_perFOV(
  all_celltypes,
  all_genes,
  transcript_df,
  transID_coln = "UMI_transID",
  transGene_coln = "target",
  cellID_coln = "UMI_cellID",
  spatLocs_colns = c("x", "y", "z"),
  extracellular_cellID = NULL,
  flagModel_TransNum_cutoff = 50,
  molecular_distance_cutoff = 2.7,
  cellular_distance_cutoff = NULL,
  score_baseline = NULL,
  lowerCutoff_transNum = NULL,
  higherCutoff_transNum = NULL,
  groupTranscripts_method = c("dbscan", "delaunay"),
  spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
  cutoff_spatialMerge = 0.5,
  leiden_config = NULL,
  config_spatNW_transcript = NULL
)

Arguments

all_celltypes

vector of all cell types consider in the analysis as listed in columns of score_GeneMatrix in parent function

all_genes

vector of all genes consider in the analysis as listed in rows of score_GeneMatrix in parent function

transcript_df

the data.frame for each transcript with columns for transcript_id, target or gene name, original cell_id, spatial coordinates.

transID_coln

the column name of transcript_ID in transcript_df

transGene_coln

the column name of target or gene name in transcript_df

cellID_coln

the column name of cell_ID in transcript_df

spatLocs_colns

column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in transcript_df

extracellular_cellID

a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)

flagModel_TransNum_cutoff

the cutoff of transcript number to do spatial modeling for identification of wrongly segmented cells (default = 50)

molecular_distance_cutoff

maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate (default = 2.7 micron). If set to NULL, the pipeline would first randomly choose no more than 2500 cells from up to 10 random picked ROIs with search radius to be 5 times of cellular_distance_cutoff, and then calculate the minimal molecular distance between picked cells. The pipeline would further use the 5 times of 90% quantile of minimal molecular distance as molecular_distance_cutoff. This calculation is slow and is not recommended for large transcript data.frame.

cellular_distance_cutoff

maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate. Default = NULL to use the 2 times of average 2D cell diameter.

score_baseline

a named vector of score baseline under each cell type listed in score_GeneMatrix such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence

lowerCutoff_transNum

a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is

higherCutoff_transNum

a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.

groupTranscripts_method

use either "dbscan" or "delaunay method" to group transcripts in space (default = "dbscan")

spatialMergeCheck_method

use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")

cutoff_spatialMerge

spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For spatialMergeCheck_method = "leidenCut", this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. For spatialMergeCheck_method = "geometryDiff", this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.

leiden_config

(leidenCut) a list of configuration to pass to reticulate and igraph::cluster_leiden function, including objective_function, resolution_parameter, beta, n_iterations.

config_spatNW_transcript

configuration list to create spatial network at transcript level, see manual for createSpatialDelaunayNW_from_spatLocs for more details, set to NULL to use default config (default = NULL)

Value

a list

transcript_df

transcript data.frame ready for downstream full resgmentation pipeline

cellular_distance_cutoff

maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate.

molecular_distance_cutoff

maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate

config_spatNW_transcript

configuration list to create spatial network at transcript level

leiden_config

configuration list to do leiden clustering on spatial network at transcript level for merge event evaluation