modular wrapper to evalute transcript groups in neighborhood, decide resegmentation operations and execute
Usage
runSegRefinement(
score_GeneMatrix,
chosen_cells = NULL,
reseg_transcript_df,
reseg_cellID_coln = "tmp_cellID",
reseg_celltype_coln = "group_maxCellType",
transID_coln = "transcript_id",
transGene_coln = "target",
transSpatLocs_coln = c("x", "y", "z"),
score_baseline = NULL,
lowerCutoff_transNum = NULL,
higherCutoff_transNum = NULL,
neighbor_distance_xy = NULL,
distance_cutoff = 2.7,
spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
cutoff_spatialMerge = 0.5,
leiden_config = list(objective_function = c("CPM", "modularity"), resolution_parameter
= 1, beta = 0.01, n_iterations = 200),
config_spatNW_transcript = NULL,
return_intermediates = TRUE,
return_perCellData = TRUE,
includeAllRefGenes = FALSE,
seed_segRefine = NULL
)
Arguments
- score_GeneMatrix
the gene x cell-type matrix of log-like score of gene in each cell type
- chosen_cells
the cell_ID of chosen cells need to be evaluate for re-segmentation
- reseg_transcript_df
the data.frame with transcript_id, target/geneName, x, y, cell_id for all transcript groups and the cell type of maximum transcript scores for each transcript group
- reseg_cellID_coln
the column name of cell_ID for all transcript groups in transcript_df
- reseg_celltype_coln
the column name of cell_type for all transcript groups in transcript_df
- transID_coln
the column name of transcript_ID in transcript_df
- transGene_coln
the column name of target or gene name in transcript_df
- transSpatLocs_coln
the column name of 1st, 2nd, optional 3rd spatial dimension of each transcript in transcript_df
- score_baseline
a named vector of score baseline for all cell type listed in neighborhood_df such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence
- lowerCutoff_transNum
a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is
- higherCutoff_transNum
a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.
- neighbor_distance_xy
maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate. Default = NULL to use the 2 times of average 2D cell diameter.
- distance_cutoff
maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate (default = 2.7 micron). If set to NULL, the pipeline would first randomly choose no more than 2500 cells from up to 10 random picked ROIs with search radius to be 5 times of
neighbor_distance_xy
, and then calculate the minimal molecular distance between picked cells. The pipeline would further use the 5 times of 90% quantile of minimal molecular distance asdistance_cutoff
. This calculation is slow and is not recommended for large transcript data.frame.- spatialMergeCheck_method
use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")
- cutoff_spatialMerge
spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For
spatialMergeCheck_method = "leidenCut"
, this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. ForspatialMergeCheck_method = "geometryDiff"
, this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.- leiden_config
(leidenCut) a list of configuration to pass to reticulate and
igraph::cluster_leiden
function, including objective_function, resolution_parameter, beta, n_iterations.- config_spatNW_transcript
(leidenCut) configuration list to create spatial network at transcript level, see manual for
createSpatialDelaunayNW_from_spatLocs
for more details, set to NULL to use default config- return_intermediates
flag to return intermediate outputs, including
neighborhoodDF_ToReseg
data.frame for neighborhood evaluation,reseg_actions
list of resegmentation actions- return_perCellData
flag to return gene x cell count matrix and per cell DF with updated mean spatial coordinates and new cell type
- includeAllRefGenes
flag to include all genes in
score_GeneMatrix
in the returnedupdated_perCellExprs
with missing genes of value 0 (default = FALSE)- seed_segRefine
seed for transcript error correction step, default = NULL to skip the seed
Value
a list
- updated_transDF
the updated transcript_df with
updated_cellID
andupdated_celltype
columns based on reseg_full_converter- neighborhoodDF_ToReseg
a data.frame for neighborhood environment of low-score transcript groups, output of
get_neighborhood_content
function, return whenreturn_intermediates
= TRUE- reseg_actions
a list of 4 elements describing how the resegmenation would be performed on original
transcript_df
by the group assignment of transcripts listed ingroupDF_ToFlagTrans
, output ofdecide_ReSegment_Operations
function, return whenreturn_intermediates
= TRUE- updated_perCellDT
a per cell data.table with mean spatial coordinates, new cell type and resegmentation action after resegmentation, return when
return_perCellData
= TRUE- updated_perCellExprs
a gene x cell count sparse matrix for updated transcript data.frame after resegmentation, return when
return_perCellData
= TRUE