How to Use This Book – A Guide to Whole Transcriptome CosMx<sup>®</sup> SMI Analysis

Who This Book Is For 🔬

This guide is for any scientist or bioinformatician interested in analyzing CosMx^® Spatial Molecular Imager (SMI) data. It is designed to be accessible for newcomers while also providing a robust, reproducible workflow for experienced users.

While our primary focus will be on a single-slide, subcellular Whole Transcriptome (WTX) dataset, the principles and code are broadly applicable to other RNA panels (like the 1K or 6K panels). Larger, multi-slide experiments have additional considerations that will be a topic for another day. While this book demonstrates a complete analysis from start to finish, its chapters are also designed to be modular. This structure allows you to either follow the entire workflow linearly or to treat it as a “choose your own adventure” guide—jumping directly to the sections most relevant to your specific questions and data.

Navigating This Guide 🗺️

This content is structured as a Quarto Book, with several features to help you find information:

Left Sidebar: Provides navigation between the main chapters of the book.
Right Sidebar: Shows a table of contents for the sections within the current chapter.
</> Code Button: Located in the top-right corner of many pages, this button will show or hide the source code used to generate the outputs in a given section, programmatic formatting of tabs of plots, Observable JS code blocks of interactive plots that are not otherwise shown directly, etc.

The Analytical Philosophy 🧠

There is no single “correct” way to analyze spatial data, just as there is no single, linear path that individual cells interact. The most powerful analysis is always driven by the specific biological questions and hypotheses of your study and even then there are multiple statistical approaches that can work. This book does not present a rigid, one-size-fits-all pipeline. Instead, it provides a comprehensive and adaptable foundation. The goal is to equip you with the tools and intuition needed to confidently explore your own data and tailor the analysis to your unique research goals.

Alternative Approach

There are so many ways to analyze CosMx SMI data. While this book shows a few such solutions, there are some other methods that may be worth considering. I’ll make use of these Alternative Approach boxes like this to suggest additional ways to tackle a given problem. If there are two approaches that are helpful to show side-by-side I’ll show those in the main text.

Deep Dives

Readers who are interested in learning additional details about a particular topic, such as how certian function parameters effect a result, can learn more in boxes with the gear icon. These sections can be expanded or collapsed by clicking on the arrow on the top right of the box.

If an interactive visualization is needed to help us understand a particular topic, I may make use of serverless webassembly techniques such as those found in the fantastic quarto-live¹ quarto extension. In a nutshell, the quarto-live extension provide instructions to your browser on how to run R or python content. This integrates well with Observable JS and quarto. For more information on quarto-live, see their online documentation.

feel free to adjust the code and run it.

Technical Approach and Reproducibility 🛠️

This guide uniquely leverages the combined strengths of both R and Python, as both languages offer exceptional, complementary ecosystems for spatial analysis. I primarily use Python’s AnnData² and scanpy³ packages for their performance and scalability, while relying on R’s tidyverse⁴ for specific statistical methods, data wrangling, and visualization. RStudio’s integrated support for both languages via reticulate⁵ makes this hybrid approach surprisingly seamless. To ensure a fully reproducible analysis environment, the project repository for this book includes:

renv.lock: A file to restore the exact R package library using the renv package.
requirements.txt: A file to recreate the Python environment using pip. Instructions for setting up the complete, combined environment can be found in the project’s README file on GitHub.

Since this book was written in the Quarto book format, individual chapters are executed “in isolation” from one another. This means its necessary to write data to disk in order for a subsequent chapter to be able to use it. Certain computations can take hours as well and so it’s not ideal to try to unnecessarily re-compute these each time I “build” (render) this book. My technical approach here – and indeed in most of my daily analyses – is to analyze code blocks interactively and then only evaluate code blocks that are absolutely necessary for rendering the documnet such as including and formatting a pre-computed image. More information about this analysis system can be found in the Introduction.

Throughout this book I work with a particular dataset to show an end-to-end analysis. There are some asides that are not part of the analysis but rather to illustrate an idea or provide supporting materal. For code blocks that are not part of the main analysis, I’ll mark the a “Python - supporting” tag like so:

Python - supporting

# Code used for supporting material

If you are interested in adapating this environment to your own workflow, please see more information Setting up Your Own Analysis Environment.

I should note that CosMx SMI data analysis can be quite resource demanding. The analysis within this book was created on Amazon g6.16xlarge EC2 instance equipped with one L4 NVIDIA GPU (24 GB RAM) and 64 vCPUs (256 GB RAM). The operating system that was used was Ubuntu 24.04.1 LTS.

--- author: - name: Evelyn Metzger orcid: 0000-0002-4074-9003 affiliations: - ref: bsb - ref: eveilyeverafter execute: eval: true freeze: auto message: true warning: false self-contained: false code-fold: false code-tools: true code-annotations: hover format: live-html engine: knitr webr: packages: - ggplot2 repos: - https://r-lib.r-universe.dev --- {{< include ./_extensions/r-wasm/live/_knitr.qmd >}} # How to Use This Book {.unnumbered} ## Who This Book Is For 🔬 This guide is for any scientist or bioinformatician interested in analyzing CosMx<sup>®</sup> Spatial Molecular Imager (SMI) data. It is designed to be accessible for newcomers while also providing a robust, reproducible workflow for experienced users. While our primary focus will be on a single-slide, subcellular Whole Transcriptome (WTX) dataset, the principles and code are broadly applicable to other RNA panels (like the 1K or 6K panels). Larger, multi-slide experiments have additional considerations that will be a topic for another day. While this book demonstrates a complete analysis from start to finish, its chapters are also designed to be modular. This structure allows you to either follow the entire workflow linearly or to treat it as a "choose your own adventure" guide—jumping directly to the sections most relevant to your specific questions and data. --- ## Navigating This Guide 🗺️ This content is structured as a **Quarto Book**, with several features to help you find information: * **Left Sidebar:** Provides navigation between the main chapters of the book. * **Right Sidebar:** Shows a table of contents for the sections within the current chapter. * **`</> Code` Button:** Located in the top-right corner of many pages, this button will show or hide the source code used to generate the outputs in a given section, programmatic formatting of _tabs_ of plots, Observable JS code blocks of interactive plots that are not otherwise shown directly, _etc_. --- ## The Analytical Philosophy 🧠 There is no single "correct" way to analyze spatial data, just as there is no single, linear path that individual cells interact. The most powerful analysis is always driven by the specific biological questions and hypotheses of your study and even then there are multiple statistical approaches that can work. This book does not present a rigid, one-size-fits-all pipeline. Instead, it provides a comprehensive and adaptable foundation. The goal is to equip you with the tools and intuition needed to confidently explore your own data and tailor the analysis to your unique research goals. ::: {.column-margin} ::: {.otherapproachesbox title="Alternative Approach"} There are so many ways to analyze CosMx SMI data. While this book shows a few such solutions, there are some other methods that may be worth considering. I'll make use of these `Alternative Approach` boxes like this to suggest additional ways to tackle a given problem. If there are two approaches that are helpful to show side-by-side I'll show those in the main text. ::: ::: :::{.noteworthybox title="Deep Dives" collapse="show"} Readers who are interested in learning additional details about a particular topic, such as how certian function parameters effect a result, can learn more in boxes with the gear icon. These sections can be expanded or collapsed by clicking on the arrow on the top right of the box. ::: If an interactive visualization is needed to help us understand a particular topic, I may make use of serverless webassembly techniques such as those found in the fantastic `quarto-live` @quarto-live quarto extension. In a nutshell, the quarto-live extension provide instructions to your browser on how to run R or python content. This integrates well with Observable JS and quarto. For more information on `quarto-live`, see their [online documentation](https://r-wasm.github.io/quarto-live/). ```{webr} df <- data.frame(x = rnorm(10), y = rnorm(10)) # <1> p <- ggplot(data=df, aes(x=x, y=y)) + geom_point() + theme_bw() p ``` 1. feel free to adjust the code and run it. --- ## Technical Approach and Reproducibility 🛠️ This guide uniquely leverages the combined strengths of both **R** and **Python**, as both languages offer exceptional, complementary ecosystems for spatial analysis. I primarily use Python's `AnnData` @anndata and `scanpy` @scanpy packages for their performance and scalability, while relying on R's `tidyverse` @tidyverse for specific statistical methods, data wrangling, and visualization. RStudio's integrated support for both languages via `reticulate` @reticulate makes this hybrid approach surprisingly seamless. To ensure a fully reproducible analysis environment, the project repository for this book includes: * **`renv.lock`:** A file to restore the exact R package library using the `renv` package. * **`requirements.txt`:** A file to recreate the Python environment using `pip`. Instructions for setting up the complete, combined environment can be found in the project's README file on GitHub. Since this book was written in the Quarto book format, individual chapters are executed "in isolation" from one another. This means its necessary to write data to disk in order for a subsequent chapter to be able to use it. Certain computations can take hours as well and so it's not ideal to try to unnecessarily re-compute these each time I "build" (render) this book. My technical approach here -- and indeed in most of my daily analyses -- is to analyze code blocks interactively and then only evaluate code blocks that are absolutely necessary for rendering the documnet such as including and formatting a pre-computed image. More information about this analysis system can be found in the [Introduction](@ssec-introduction). Throughout this book I work with a particular dataset to show an end-to-end analysis. There are some _asides_ that are not part of the analysis but rather to illustrate an idea or provide supporting materal. For code blocks that are not part of the main analysis, I'll mark the a "Python - supporting" tag like so: ```{python} #| eval: false #| echo: true #| code-summary: "Python - supporting" #| code-fold: show # Code used for supporting material ``` If you are interested in adapating this environment to your own workflow, please see more information [Setting up Your Own Analysis Environment](@sec-setup-env). I should note that CosMx SMI data analysis can be quite resource demanding. The analysis within this book was created on Amazon g6.16xlarge [EC2 instance](https://aws.amazon.com/ec2/instance-types/g6/) equipped with one L4 NVIDIA GPU (24 GB RAM) and 64 vCPUs (256 GB RAM). The operating system that was used was Ubuntu 24.04.1 LTS.