supporting function for fastReseg_perFOV_full_process
, checks and preps inputs for full resegmentation pipeline on the provided transcript_df
and calculates distance cutoffs and set values for config_spatNW_transcript
and leiden_config
if not provided.
Usage
checkAndPrepInputs_perFOV(
all_celltypes,
all_genes,
transcript_df,
transID_coln = "UMI_transID",
transGene_coln = "target",
cellID_coln = "UMI_cellID",
spatLocs_colns = c("x", "y", "z"),
extracellular_cellID = NULL,
flagModel_TransNum_cutoff = 50,
molecular_distance_cutoff = 2.7,
cellular_distance_cutoff = NULL,
score_baseline = NULL,
lowerCutoff_transNum = NULL,
higherCutoff_transNum = NULL,
groupTranscripts_method = c("dbscan", "delaunay"),
spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
cutoff_spatialMerge = 0.5,
leiden_config = NULL,
config_spatNW_transcript = NULL
)
Arguments
- all_celltypes
vector of all cell types consider in the analysis as listed in columns of
score_GeneMatrix
in parent function- all_genes
vector of all genes consider in the analysis as listed in rows of
score_GeneMatrix
in parent function- transcript_df
the data.frame for each transcript with columns for transcript_id, target or gene name, original cell_id, spatial coordinates.
- transID_coln
the column name of transcript_ID in
transcript_df
- transGene_coln
the column name of target or gene name in
transcript_df
- cellID_coln
the column name of cell_ID in
transcript_df
- spatLocs_colns
column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in
transcript_df
- extracellular_cellID
a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)
- flagModel_TransNum_cutoff
the cutoff of transcript number to do spatial modeling for identification of wrongly segmented cells (default = 50)
- molecular_distance_cutoff
maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate (default = 2.7 micron). If set to NULL, the pipeline would first randomly choose no more than 2500 cells from up to 10 random picked ROIs with search radius to be 5 times of
cellular_distance_cutoff
, and then calculate the minimal molecular distance between picked cells. The pipeline would further use the 5 times of 90% quantile of minimal molecular distance asmolecular_distance_cutoff
. This calculation is slow and is not recommended for large transcript data.frame.- cellular_distance_cutoff
maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate. Default = NULL to use the 2 times of average 2D cell diameter.
- score_baseline
a named vector of score baseline under each cell type listed in
score_GeneMatrix
such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence- lowerCutoff_transNum
a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is
- higherCutoff_transNum
a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.
- groupTranscripts_method
use either "dbscan" or "delaunay method" to group transcripts in space (default = "dbscan")
- spatialMergeCheck_method
use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")
- cutoff_spatialMerge
spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For
spatialMergeCheck_method = "leidenCut"
, this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. ForspatialMergeCheck_method = "geometryDiff"
, this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.- leiden_config
(leidenCut) a list of configuration to pass to reticulate and
igraph::cluster_leiden
function, including objective_function, resolution_parameter, beta, n_iterations.- config_spatNW_transcript
configuration list to create spatial network at transcript level, see manual for
createSpatialDelaunayNW_from_spatLocs
for more details, set to NULL to use default config (default = NULL)
Value
a list
- transcript_df
transcript data.frame ready for downstream full resgmentation pipeline
- cellular_distance_cutoff
maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate.
- molecular_distance_cutoff
maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate
- config_spatNW_transcript
configuration list to create spatial network at transcript level
- leiden_config
configuration list to do leiden clustering on spatial network at transcript level for merge event evaluation