Evaluate neighborhood information against score and transcript number cutoff to decide the resegmetation operations. Use either leiden clustering or geometry statistics to determine whether a merge event is allowed.
Usage
decide_ReSegment_Operations(
neighborhood_df,
selfcellID_coln = "CellId",
transNum_coln = "transcript_num",
selfCellType_coln = "self_celltype",
selfScore_coln = "score_under_self",
neighborcellID_coln = "neighbor_CellId",
neighborCellType_coln = "neighbor_celltype",
neighborScore_coln = "score_under_neighbor",
score_baseline = NULL,
lowerCutoff_transNum = NULL,
higherCutoff_transNum = NULL,
transcript_df,
cellID_coln = "CellId",
transID_coln = "transcript_id",
transSpatLocs_coln = c("x", "y", "z"),
spatialMergeCheck_method = c("leidenCut", "geometryDiff"),
cutoff_spatialMerge = 0.5,
leiden_config = list(objective_function = c("CPM", "modularity"), resolution_parameter
= 1, beta = 0.01, n_iterations = 200),
config_spatNW_transcript = NULL
)
Arguments
- neighborhood_df
the data.frame containing neighborhood information for each query cells, expected to be output of get_neighborhood_content function.
- selfcellID_coln
the column name of cell_ID of query cell in neighborhood_df
- transNum_coln
the column name of transcript number of query cell in neighborhood_df
- selfCellType_coln
the column name of cell_type under query cell in neighborhood_df
- selfScore_coln
the column name of average transcript score under query cell in neighborhood_df
- neighborcellID_coln
the column name of cell_ID of neighbor cell in neighborhood_df
- neighborCellType_coln
the column name of cell_type under neighbor cell in neighborhood_df
- neighborScore_coln
the column name of average transcript score under neighbor cell in neighborhood_df
- score_baseline
a named vector of score baseline for all cell type listed in neighborhood_df such that per cell transcript score higher than the baseline is required to call a cell type of high enough confidence
- lowerCutoff_transNum
a named vector of transcript number cutoff under each cell type such that higher than the cutoff is required to keep query cell as it is
- higherCutoff_transNum
a named vector of transcript number cutoff under each cell type such that lower than the cutoff is required to keep query cell as it is when there is neighbor cell of consistent cell type.
- transcript_df
the data.frame with transcript_id, target/geneName, x, y and cell_id
- cellID_coln
the column name of cell_ID in transcript_df
- transID_coln
the column name of transcript_ID in transcript_df
- transSpatLocs_coln
the column name of 1st, 2nd, optional 3rd spatial dimension of each transcript in transcript_df
- spatialMergeCheck_method
use either "leidenCut" (in 2D or 3D) or "geometryDiff" (in 2D only) method to determine whether a cell pair merging event is allowed in space (default = "leidenCut")
- cutoff_spatialMerge
spatial constraint on a valid merging event between two source transcript groups, default = 0.5 for 50% cutoff, set to 0 to skip spatial constraint evaluation for merging. For
spatialMergeCheck_method = "leidenCut"
, this is the minimal percentage of transcripts shared membership between query cell and neighbor cells in leiden clustering results for a valid merging event. ForspatialMergeCheck_method = "geometryDiff"
, this is the maximum percentage of white space change upon merging of query cell and neighbor cell for a valid merging event.- leiden_config
(leidenCut) a list of configuration to pass to reticulate and
igraph::cluster_leiden
function, including objective_function, resolution_parameter, beta, n_iterations.- config_spatNW_transcript
(leidenCut) configuration list to create spatial network at transcript level, see manual for
createSpatialDelaunayNW_from_spatLocs
for more details, set to NULL to use default config
Value
a list
cells_to_discard, a vector of cell ID that should be discarded during resegmentation
cells_to_update, a named vector of cell ID where the cell_ID in name would be replaced with cell_ID in value.
cells_to_keep, a vector of cell ID that should be kept as it is.
reseg_full_converter, a single named vector of cell ID to update the original cell ID, assign NA for cells_to_discard.
Details
Evaluate neighborhood information against score and transcript number cutoff to decide the resegmetation operations like the following:
merge query to neighbor if consist cell type and fewer than average transcript number cutoff, higherCutoff_transNum;
keep query as new cell id if no consist neighbor cell type, but high self score and higher than minimal transcript number, lowerCutoff_transNum;
discard the rest of query cells that have no consistent neighbor cell type, fewer transcript number based on lowerCutoff_transNum, and/or low self score. The function uses network component analysis to resolve any conflict due to merging multiple query cells into one. When
cutoff_spatialMerge > 0
, the function applies additional spatial constraint on a valid merging event of query cell into neighbor cell.In case of
spatialMergeCheck_method = "leidenCut"
, the function builds spatial network at transcript level, does leiden clustering on the spatial network, and then decides whether the merge should be allowed based on the observed shared leiden membership of the two source transcript groups for a putative merging event; the providedcutoff_spatialMerge
gives the minimal values of shared leiden memberhsip for a valid merging event.In case of
spatialMergeCheck_method = "geometryDiff"
, the function would first calculate white space, i.e. the area difference between convex and concave hulls, respectively, for query cell, neighbor cell, and the corresponding merged cell; and then calculate the white space difference between the merged cell and two separate cells and normalize that value with respect to the concave area of query and neighbor cells, respectively; lastly, allow a valid merging when the normalized white space difference upon merging for both query and neighbor cells are smaller than the providedcutoff_spatialMerge
.