Choose appropriate cellular distance cutoff and molecular distance cutoff based on input transcript data.frame for downstream resegmentation; cellular distance cutoff is defined as the search radius of direct neighbor cell, while molecular distance cutoff is defined as the maximum distance between two neighbor transcripts from same source cells.
Usage
choose_distance_cutoff(
transcript_df,
transID_coln = "UMI_transID",
cellID_coln = "UMI_cellID",
spatLocs_colns = c("x", "y", "z"),
extracellular_cellID = NULL,
run_molecularDist = TRUE,
sampleSize_nROI = 10,
sampleSize_cellNum = 2500,
seed = 123
)
Arguments
- transcript_df
the data.frame for each transcript
- transID_coln
the column name of transcript_ID in
transcript_df
- cellID_coln
the column name of cell_ID in
transcript_df
- spatLocs_colns
column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in
transcript_df
- extracellular_cellID
a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)
- run_molecularDist
flag to run molecular distant cutoff estimation, default = TRUE
- sampleSize_nROI
number of ROIs randomly picked from data for molecular distance cutoff estimation
- sampleSize_cellNum
maximum number of cells from the picked ROIs for molecular distance cutoff estimation
- seed
a random seed for sub-sampling cells from whole dataset for molecular distance cutoff estimation
Value
a list
- cellular_distance_cutoff
maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate.
- perCell_coordDT
a data.table with cell in row, spatial XY coordinates of centroid and dimensions of bounding box in column
- molecular_distance_cutoff
maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate; return if run_molecularDist = TRUE
- distance_profile
a named vector for the quantile profile of minimal molecular distance between transcripts belong to different cells at step size of 10% quantile; return if run_molecularDist = TRUE
Details
cellular_distance_cutoff
is defined as maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact. The function calculates average 2D cell diameter from input data.frame and use 2 times of the mean cell diameter as cellular_distance_cutoff
. molecular_distance_cutoff
is defined as maximum molecule-to-molecule distance within connected transcript groups belonging to same source cells. The function would first randomly choose sampleSize_cellNum
number of cells from sampleSize_nROI
number of randomly picked ROIs with search radius to be 5 times of cellular_distance_cutoff
, and then calculate the minimal molecular distance between picked cells. The function would further use the 5 times of 90% quantile of minimal molecular distance as molecular_distance_cutoff
. This calculation is slow and is not recommended for large transcript data.frame.
Examples
data(mini_transcriptDF)
# cell_ID for extracellualr transcripts
extracellular_cellID <- mini_transcriptDF[which(mini_transcriptDF$CellId ==0), 'cell_ID']
distCutoffs <- choose_distance_cutoff(mini_transcriptDF,
extracellular_cellID = extracellular_cellID)
#> Use 2 times of average 2D cell diameter as cellular_distance_cutoff = 24.2375 for searching of neighbor cells.
#> Identified 3D coordinates with variance.
#> Distribution of minimal molecular distance between 1375 cells: 0, 0.08, 0.14, 0.21, 0.3, 0.4, 0.51, 0.63, 0.78, 0.86, 3.92, at quantile = 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
#> Use 5 times of 90% quantile of minimal 3D molecular distance between picked cells as `molecular_distance_cutoff` = 4.2790 for defining direct neighbor cells.