Skip to contents

Choose appropriate cellular distance cutoff and molecular distance cutoff based on input transcript data.frame for downstream resegmentation; cellular distance cutoff is defined as the search radius of direct neighbor cell, while molecular distance cutoff is defined as the maximum distance between two neighbor transcripts from same source cells.

Usage

choose_distance_cutoff(
  transcript_df,
  transID_coln = "UMI_transID",
  cellID_coln = "UMI_cellID",
  spatLocs_colns = c("x", "y", "z"),
  extracellular_cellID = NULL,
  run_molecularDist = TRUE,
  sampleSize_nROI = 10,
  sampleSize_cellNum = 2500,
  seed = 123
)

Arguments

transcript_df

the data.frame for each transcript

transID_coln

the column name of transcript_ID in transcript_df

cellID_coln

the column name of cell_ID in transcript_df

spatLocs_colns

column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in transcript_df

extracellular_cellID

a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)

run_molecularDist

flag to run molecular distant cutoff estimation, default = TRUE

sampleSize_nROI

number of ROIs randomly picked from data for molecular distance cutoff estimation

sampleSize_cellNum

maximum number of cells from the picked ROIs for molecular distance cutoff estimation

seed

a random seed for sub-sampling cells from whole dataset for molecular distance cutoff estimation

Value

a list

cellular_distance_cutoff

maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact, same unit as input spatial coordinate.

perCell_coordDT

a data.table with cell in row, spatial XY coordinates of centroid and dimensions of bounding box in column

molecular_distance_cutoff

maximum molecule-to-molecule distance within connected transcript group, same unit as input spatial coordinate; return if run_molecularDist = TRUE

distance_profile

a named vector for the quantile profile of minimal molecular distance between transcripts belong to different cells at step size of 10% quantile; return if run_molecularDist = TRUE

Details

cellular_distance_cutoff is defined as maximum cell-to-cell distance in x, y between the center of query cells to the center of neighbor cells with direct contact. The function calculates average 2D cell diameter from input data.frame and use 2 times of the mean cell diameter as cellular_distance_cutoff. molecular_distance_cutoff is defined as maximum molecule-to-molecule distance within connected transcript groups belonging to same source cells. The function would first randomly choose sampleSize_cellNum number of cells from sampleSize_nROI number of randomly picked ROIs with search radius to be 5 times of cellular_distance_cutoff, and then calculate the minimal molecular distance between picked cells. The function would further use the 5 times of 90% quantile of minimal molecular distance as molecular_distance_cutoff. This calculation is slow and is not recommended for large transcript data.frame.

Examples

data(mini_transcriptDF)
# cell_ID for extracellualr transcripts
extracellular_cellID <- mini_transcriptDF[which(mini_transcriptDF$CellId ==0), 'cell_ID']
distCutoffs <- choose_distance_cutoff(mini_transcriptDF,
                                      extracellular_cellID = extracellular_cellID)
#> Use 2 times of average 2D cell diameter as cellular_distance_cutoff = 24.2375 for searching of neighbor cells.
#> Identified 3D coordinates with variance. 
#> Distribution of minimal molecular distance between 1375 cells: 0, 0.08, 0.14, 0.21, 0.3, 0.4, 0.51, 0.63, 0.78, 0.86, 3.92, at quantile = 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%.
#> Use 5 times of 90% quantile of minimal 3D molecular distance between picked cells as `molecular_distance_cutoff` = 4.2790 for defining direct neighbor cells.