Skip to contents

check input formats for transcript data.frame file list and load 1st fov

Usage

checkTransFileInputsAndLoadFirst(
  transcript_df = NULL,
  transDF_fileInfo = NULL,
  filepath_coln = "file_path",
  prefix_colns = c("slide", "fov"),
  fovOffset_colns = c("stage_X", "stage_Y"),
  pixel_size = 0.18,
  zstep_size = 0.8,
  transID_coln = NULL,
  transGene_coln = "target",
  cellID_coln = "CellId",
  spatLocs_colns = c("x", "y", "z"),
  extracellular_cellID = NULL
)

Arguments

transcript_df

the data.frame of transcript level information with unique CellId, set to NULL if read from the transDF_fileInfo

transDF_fileInfo

a data.frame with each row for each individual file of per FOV transcript data.frame within which the coordinates and CellId are unique, columns include the file path of per FOV transcript data.frame file, annotation columns like slide and fov to be used as prefix when creating unique cell_ID across entire data set; when NULL, use the provided transcript_df directly

filepath_coln

the column name of each individual file of per FOV transcript data.frame in transDF_fileInfo

prefix_colns

the column names of annotation in transDF_fileInfo, to be added to the CellId as prefix when creating unique cell_ID for entire data set; set to NULL if use the original transID_coln or cellID_coln

fovOffset_colns

the column name of coordinate offsets in 1st and 2nd dimension for each per FOV transcript data.frame in transDF_fileInfo, unit in micron Notice that some assays like SMI has XY axes swapped between stage and each FOV such that fovOffset_colns should be c("stage_Y", "stage_X").

pixel_size

the micrometer size of image pixel listed in 1st and 2nd dimension of spatLocs_colns of each transcript_df

zstep_size

the micrometer size of z-step for the optional 3rd dimension of spatLocs_colns of each transcript_df

transID_coln

the column name of transcript_ID in transcript_df, default = NULL to use row index of transcript in each transcript_df; when prefix_colns != NULL, unique transcript_id would be generated from prefix_colns and transID_coln in each transcript_df

transGene_coln

the column name of target or gene name in transcript_df

cellID_coln

the column name of cell_ID in transcript_df; when prefix_colns != NULL, unique cell_ID would be generated from prefix_colns and cellID_coln in each transcript_df

spatLocs_colns

column names for 1st, 2nd and optional 3rd dimension of spatial coordinates in transcript_df

extracellular_cellID

a vector of cell_ID for extracellular transcripts which would be removed from the resegmention pipeline (default = NULL)

Value

a list contains transcript_df for downstream process and extracellular transcript data.frame '

intraC

a data.frame for intracellular transcript, UMI_transID and UMI_cellID as column names for unique transcript_id and cell_id, target as column name for target gene name

extraC

a data.frame for extracellular transcript, same structure as the intraC data.frame in returned list