checkTransFileInputsAndLoadFirst — checkTransFileInputsAndLoadFirst • FastReseg

check input formats for transcript data.frame file list and load 1st fov

Usage

checkTransFileInputsAndLoadFirst(
  transcript_df = NULL,
  transDF_fileInfo = NULL,
  filepath_coln = "file_path",
  prefix_colns = c("slide", "fov"),
  fovOffset_colns = c("stage_X", "stage_Y"),
  pixel_size = 0.18,
  zstep_size = 0.8,
  transID_coln = NULL,
  transGene_coln = "target",
  cellID_coln = "CellId",
  spatLocs_colns = c("x", "y", "z"),
  extracellular_cellID = NULL
)

Arguments

transcript_df: the data.frame of transcript level information with unique CellId, set to NULL if read from the transDF_fileInfo
transDF_fileInfo: a data.frame with each row for each individual file of per FOV transcript data.frame within which the coordinates and CellId are unique, columns include the file path of per FOV transcript data.frame file, annotation columns like slide and fov to be used as prefix when creating unique cell_ID across entire data set; when NULL, use the provided transcript_df directly
filepath_coln: the column name of each individual file of per FOV transcript data.frame in transDF_fileInfo
prefix_colns: the column names of annotation in transDF_fileInfo, to be added to the CellId as prefix when creating unique cell_ID for entire data set; set to NULL if use the original transID_coln or cellID_coln
fovOffset_colns: the column name of coordinate offsets in 1st and 2nd dimension for each per FOV transcript data.frame in transDF_fileInfo, unit in micron Notice that some assays like SMI has XY axes swapped between stage and each FOV such that fovOffset_colns should be c("stage_Y", "stage_X").
pixel_size: the micrometer size of image pixel listed in 1st and 2nd dimension of spatLocs_colns of each transcript_df
zstep_size: the micrometer size of z-step for the optional 3rd dimension of spatLocs_colns of each transcript_df
transID_coln: the column name of transcript_ID in transcript_df, default = NULL to use row index of transcript in each transcript_df; when prefix_colns != NULL, unique transcript_id would be generated from prefix_colns and transID_coln in each transcript_df
transGene_coln: the column name of target or gene name in transcript_df
cellID_coln: the column name of cell_ID in transcript_df; when prefix_colns != NULL, unique cell_ID would be generated from prefix_colns and cellID_coln in each transcript_df
spatLocs_colns: column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in transcript_df
extracellular_cellID: a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)

Value

a list contains transcript_df for downstream process and extracellular transcript data.frame '

intraC: a data.frame for intracellular transcript, UMI_transID and UMI_cellID as column names for unique transcript_id and cell_id, target as column name for target gene name
extraC: a data.frame for extracellular transcript, same structure as the intraC data.frame in returned list