How to Use This Book

Author
Affiliations

Evelyn Metzger

Bruker Spatial Biology

Github: eveilyeverafter

Who This Book Is For 🔬

This guide is for any scientist or bioinformatician interested in analyzing CosMx® Spatial Molecular Imager (SMI) data. It is designed to be accessible for newcomers while also providing a robust, reproducible workflow for experienced users.

While our primary focus will be on a single-slide, subcellular Whole Transcriptome (WTX) dataset, the principles and code are broadly applicable to other RNA panels (like the 1K or 6K panels). Larger, multi-slide experiments have additional considerations that will be a topic for another day. While this book demonstrates a complete analysis from start to finish, its chapters are also designed to be modular. This structure allows you to either follow the entire workflow linearly or to treat it as a “choose your own adventure” guide—jumping directly to the sections most relevant to your specific questions and data.


The Analytical Philosophy 🧠

There is no single “correct” way to analyze spatial data, just as there is no single, linear path that individual cells interact. The most powerful analysis is always driven by the specific biological questions and hypotheses of your study and even then there are multiple statistical approaches that can work. This book does not present a rigid, one-size-fits-all pipeline. Instead, it provides a comprehensive and adaptable foundation. The goal is to equip you with the tools and intuition needed to confidently explore your own data and tailor the analysis to your unique research goals.

Alternative Approach

There are so many ways to analyze CosMx SMI data. While this book shows a few such solutions, there are some other methods that may be worth considering. I’ll make use of these Alternative Approach boxes like this to suggest additional ways to tackle a given problem. If there are two approaches that are helpful to show side-by-side I’ll show those in the main text.

Deep Dives

Readers who are interested in learning additional details about a particular topic, such as how certian function parameters effect a result, can learn more in boxes with the gear icon. These sections can be expanded or collapsed by clicking on the arrow on the top right of the box.

If an interactive visualization is needed to help us understand a particular topic, I may make use of serverless webassembly techniques such as those found in the fantastic quarto-live1 quarto extension. In a nutshell, the quarto-live extension provide instructions to your browser on how to run R or python content. This integrates well with Observable JS and quarto. For more information on quarto-live, see their online documentation.

  1. feel free to adjust the code and run it.

Technical Approach and Reproducibility 🛠️

This guide uniquely leverages the combined strengths of both R and Python, as both languages offer exceptional, complementary ecosystems for spatial analysis. I primarily use Python’s AnnData2 and scanpy3 packages for their performance and scalability, while relying on R’s tidyverse4 for specific statistical methods, data wrangling, and visualization. RStudio’s integrated support for both languages via reticulate5 makes this hybrid approach surprisingly seamless. To ensure a fully reproducible analysis environment, the project repository for this book includes:

  • renv.lock: A file to restore the exact R package library using the renv package.
  • requirements.txt: A file to recreate the Python environment using pip. Instructions for setting up the complete, combined environment can be found in the project’s README file on GitHub.

Since this book was written in the Quarto book format, individual chapters are executed “in isolation” from one another. This means its necessary to write data to disk in order for a subsequent chapter to be able to use it. Certain computations can take hours as well and so it’s not ideal to try to unnecessarily re-compute these each time I “build” (render) this book. My technical approach here – and indeed in most of my daily analyses – is to analyze code blocks interactively and then only evaluate code blocks that are absolutely necessary for rendering the documnet such as including and formatting a pre-computed image. More information about this analysis system can be found in the Introduction.

Throughout this book I work with a particular dataset to show an end-to-end analysis. There are some asides that are not part of the analysis but rather to illustrate an idea or provide supporting materal. For code blocks that are not part of the main analysis, I’ll mark the a “Python - supporting” tag like so:

Python - supporting
# Code used for supporting material

If you are interested in adapating this environment to your own workflow, please see more information Setting up Your Own Analysis Environment.

I should note that CosMx SMI data analysis can be quite resource demanding. The analysis within this book was created on Amazon g6.16xlarge EC2 instance equipped with one L4 NVIDIA GPU (24 GB RAM) and 64 vCPUs (256 GB RAM). The operating system that was used was Ubuntu 24.04.1 LTS.