scanpy

Open In Colab

Note: This notebook was created largely based off Preprocessing and clustering 3k PBMCs. Portions have been removed/edited to adapt to a class time of ~30 minutes.

scanpy is a toolkit based in python for single-cell analysis. Some applications of scanpy include:

clustering of single-cell data
trajectory inference (reconstruction of cell pathways)
differential expression testing (testing differences in gene expression between different cell populations)

Learning objectives

Gain familiarity with single-cell data
Experiment with the adata object
Perform dimension reduction and some clustering analysis of scRNA-seq

Let’s first discuss what single-cell data is/looks like…

single-cell data

There are several types of single-cell data:

scDNA-seq (genomic single-cell)
scRNA-seq (transcriptomic single-cell)
scBS-seq (single-cell bisulfite sequencing)
…

These modalities differentiate biological behavior/mechanisms. In this tutorial, we will be looking at 2700 peripheral blood mononuclear cells (PBMCs) from a healthy donor.

What are the benefits of single-cell sequencing over bulk sequencing?

Knowing the sequencing profiles of single-cells adds granularity to data obtained from samples that may contain more than one type of cell. For instance, knowing the transcriptomic profiles of single cells in a population of heterogeneous tumor cells can reveal insights into tumorigenesis. Researchers may be able to investigate novel biological activity as stem cells abnormally mature into cancerous cells.

How is single-cell data created?

There are various methods depending on the application and type of single-cell data. However, I hope the image (Chan Zuckerberg Initiative) below captures how single-cell data is created. In short, single cells are isolated from tissue, sequenced and amplified.

Now that we know how single-cell data is generated, let’s talk about how single-cell data is represented in scanpy.

# install these packages first
# install the anndata library
!pip install anndata
# install the scanpy library
!pip install scanpy
!pip install leidenalg

Collecting anndata
  Downloading anndata-0.11.4-py3-none-any.whl.metadata (9.3 kB)
Collecting array-api-compat!=1.5,>1.4 (from anndata)
  Downloading array_api_compat-1.12.0-py3-none-any.whl.metadata (2.5 kB)
Collecting h5py>=3.7 (from anndata)
  Downloading h5py-3.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.7 kB)
Collecting natsort (from anndata)
  Downloading natsort-8.4.0-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: numpy>=1.23 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from anndata) (2.3.0)
Requirement already satisfied: packaging>=24.2 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from anndata) (25.0)
Requirement already satisfied: pandas!=2.1.0rc0,!=2.1.2,>=1.4 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from anndata) (2.3.0)
Collecting scipy>1.8 (from anndata)
  Downloading scipy-1.15.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata) (2025.2)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata) (1.17.0)
Downloading anndata-0.11.4-py3-none-any.whl (144 kB)
Downloading array_api_compat-1.12.0-py3-none-any.whl (58 kB)
Downloading h5py-3.14.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.9 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 24.3 MB/s eta 0:00:00
Downloading scipy-1.15.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (37.3 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/37.3 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 37.3/37.3 MB 227.6 MB/s eta 0:00:00
Downloading natsort-8.4.0-py3-none-any.whl (38 kB)
Installing collected packages: scipy, natsort, h5py, array-api-compat, anndata
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/5 [scipy]   ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 2/5 [h5py]   ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 2/5 [h5py]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━ 4/5 [anndata]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5/5 [anndata]
Successfully installed anndata-0.11.4 array-api-compat-1.12.0 h5py-3.14.0 natsort-8.4.0 scipy-1.15.3
Collecting scanpy
  Downloading scanpy-1.11.2-py3-none-any.whl.metadata (9.1 kB)
Requirement already satisfied: anndata>=0.8 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (0.11.4)
Requirement already satisfied: h5py>=3.7.0 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (3.14.0)
Collecting joblib (from scanpy)
  Downloading joblib-1.5.1-py3-none-any.whl.metadata (5.6 kB)
Collecting legacy-api-wrap>=1.4.1 (from scanpy)
  Downloading legacy_api_wrap-1.4.1-py3-none-any.whl.metadata (2.1 kB)
Requirement already satisfied: matplotlib>=3.7.5 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (3.10.3)
Requirement already satisfied: natsort in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (8.4.0)
Collecting networkx>=2.7.1 (from scanpy)
  Downloading networkx-3.5-py3-none-any.whl.metadata (6.3 kB)
Collecting numba>=0.57.1 (from scanpy)
  Downloading numba-0.61.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.8 kB)
Requirement already satisfied: numpy>=1.24.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (2.3.0)
Requirement already satisfied: packaging>=21.3 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (25.0)
Requirement already satisfied: pandas>=1.5.3 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (2.3.0)
Collecting patsy!=1.0.0 (from scanpy)
  Downloading patsy-1.0.1-py2.py3-none-any.whl.metadata (3.3 kB)
Collecting pynndescent>=0.5.13 (from scanpy)
  Downloading pynndescent-0.5.13-py3-none-any.whl.metadata (6.8 kB)
Collecting scikit-learn>=1.1.3 (from scanpy)
  Downloading scikit_learn-1.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (17 kB)
Requirement already satisfied: scipy>=1.8.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from scanpy) (1.15.3)
Collecting seaborn>=0.13.2 (from scanpy)
  Downloading seaborn-0.13.2-py3-none-any.whl.metadata (5.4 kB)
Collecting session-info2 (from scanpy)
  Downloading session_info2-0.1.2-py3-none-any.whl.metadata (2.5 kB)
Collecting statsmodels>=0.14.4 (from scanpy)
  Downloading statsmodels-0.14.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Collecting tqdm (from scanpy)
  Downloading tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting umap-learn>=0.5.6 (from scanpy)
  Downloading umap_learn-0.5.7-py3-none-any.whl.metadata (21 kB)
Requirement already satisfied: array-api-compat!=1.5,>1.4 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from anndata>=0.8->scanpy) (1.12.0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (1.3.2)
Requirement already satisfied: cycler>=0.10 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (4.58.2)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (1.4.8)
Requirement already satisfied: pillow>=8 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (11.2.1)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (3.2.3)
Requirement already satisfied: python-dateutil>=2.7 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from matplotlib>=3.7.5->scanpy) (2.9.0.post0)
Collecting llvmlite<0.45,>=0.44.0dev0 (from numba>=0.57.1->scanpy)
  Downloading llvmlite-0.44.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.0 kB)
Collecting numpy>=1.24.1 (from scanpy)
  Downloading numpy-2.2.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (62 kB)
Requirement already satisfied: pytz>=2020.1 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from pandas>=1.5.3->scanpy) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from pandas>=1.5.3->scanpy) (2025.2)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages (from python-dateutil>=2.7->matplotlib>=3.7.5->scanpy) (1.17.0)
Collecting threadpoolctl>=3.1.0 (from scikit-learn>=1.1.3->scanpy)
  Downloading threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Downloading scanpy-1.11.2-py3-none-any.whl (2.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.1 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 141.1 MB/s eta 0:00:00
Downloading legacy_api_wrap-1.4.1-py3-none-any.whl (10.0 kB)
Downloading networkx-3.5-py3-none-any.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.0 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 169.8 MB/s eta 0:00:00
Downloading numba-0.61.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/3.9 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.9/3.9 MB 185.7 MB/s eta 0:00:00
Downloading llvmlite-0.44.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/42.4 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 42.2/42.4 MB 289.3 MB/s eta 0:00:01   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.4/42.4 MB 189.7 MB/s eta 0:00:00
Downloading numpy-2.2.6-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/16.5 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 201.8 MB/s eta 0:00:00
Downloading patsy-1.0.1-py2.py3-none-any.whl (232 kB)
Downloading pynndescent-0.5.13-py3-none-any.whl (56 kB)
Downloading joblib-1.5.1-py3-none-any.whl (307 kB)
Downloading scikit_learn-1.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/12.5 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.5/12.5 MB 154.6 MB/s eta 0:00:00
Downloading seaborn-0.13.2-py3-none-any.whl (294 kB)
Downloading statsmodels-0.14.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.7 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/10.7 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.7/10.7 MB 192.5 MB/s eta 0:00:00
Downloading threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Downloading umap_learn-0.5.7-py3-none-any.whl (88 kB)
Downloading session_info2-0.1.2-py3-none-any.whl (14 kB)
Downloading tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm, threadpoolctl, session-info2, numpy, networkx, llvmlite, legacy-api-wrap, joblib, patsy, numba, statsmodels, scikit-learn, seaborn, pynndescent, umap-learn, scanpy
  Attempting uninstall: numpy
    Found existing installation: numpy 2.3.0
    Uninstalling numpy-2.3.0:
      Successfully uninstalled numpy-2.3.0
   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  3/16 [numpy]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━  4/16 [networkx]   ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/16 [llvmlite]   ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/16 [llvmlite]   ━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━  5/16 [llvmlite]   ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━  7/16 [joblib]   ━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━  7/16 [joblib]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━  9/16 [numba]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━ 10/16 [statsmodels]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━ 11/16 [scikit-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━ 12/16 [seaborn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━ 14/16 [umap-learn]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 15/16 [scanpy]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 [scanpy]
Successfully installed joblib-1.5.1 legacy-api-wrap-1.4.1 llvmlite-0.44.0 networkx-3.5 numba-0.61.2 numpy-2.2.6 patsy-1.0.1 pynndescent-0.5.13 scanpy-1.11.2 scikit-learn-1.7.0 seaborn-0.13.2 session-info2-0.1.2 statsmodels-0.14.4 threadpoolctl-3.6.0 tqdm-4.67.1 umap-learn-0.5.7
Collecting leidenalg
  Downloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)
Collecting igraph<0.12,>=0.10.0 (from leidenalg)
  Downloading igraph-0.11.9-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting texttable>=1.6.2 (from igraph<0.12,>=0.10.0->leidenalg)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Downloading leidenalg-0.10.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.0 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 76.1 MB/s eta 0:00:00
Downloading igraph-0.11.9-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.4 MB ? eta -:--:--   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 120.2 MB/s eta 0:00:00
Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: texttable, igraph, leidenalg
   ━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━ 1/3 [igraph]   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 [leidenalg]
Successfully installed igraph-0.11.9 leidenalg-0.10.2 texttable-1.7.0

# import necessary libraries
import numpy as np
import pandas as pd
import anndata as ad
from scipy.sparse import csr_matrix
import scanpy as sc

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[2], line 6
      4 import anndata as ad
      5 from scipy.sparse import csr_matrix
----> 6 import scanpy as sc

File /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages/scanpy/__init__.py:9
      5 import sys
      7 from packaging.version import Version
----> 9 from ._utils import check_versions
     10 from ._version import __version__
     12 check_versions()

File /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages/scanpy/_utils/__init__.py:40
     38 from .._compat import CSBase, DaskArray, _CSMatrix, _register_union
     39 from .._settings import settings
---> 40 from .compute.is_constant import is_constant  # noqa: F401
     42 if Version(anndata_version) >= Version("0.10.0"):
     43     from anndata._core.sparse_dataset import (
     44         BaseCompressedSparseDataset as SparseDataset,
     45     )

File /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages/scanpy/_utils/compute/is_constant.py:7
      4 from numbers import Integral
      5 from typing import TYPE_CHECKING, overload
----> 7 import numba
      8 import numpy as np
     10 from ..._compat import CSCBase, CSRBase, DaskArray, _register_union, njit

File /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages/numba/__init__.py:59
     54             msg = ("Numba requires SciPy version 1.0 or greater. Got SciPy "
     55                    f"{scipy.__version__}.")
     56             raise ImportError(msg)
---> 59 _ensure_critical_deps()
     60 # END DO NOT MOVE
     61 # ---------------------- WARNING WARNING WARNING ----------------------------
     64 from ._version import get_versions

File /opt/hostedtoolcache/Python/3.13.4/x64/lib/python3.13/site-packages/numba/__init__.py:45, in _ensure_critical_deps()
     42 if numpy_version > (2, 2):
     43     msg = (f"Numba needs NumPy 2.2 or less. Got NumPy "
     44            f"{numpy_version[0]}.{numpy_version[1]}.")
---> 45     raise ImportError(msg)
     47 try:
     48     import scipy

ImportError: Numba needs NumPy 2.2 or less. Got NumPy 2.3.

## AnnData

The anndata python package enables the use of anndata objects which are essentially annotated data matrices.

The following (taken from Getting started with anndata) demonstrates some of the features of the anndata object.

# first, create a matrix of 100 (cells) x 2000 (genes)
counts = csr_matrix(np.random.poisson(1, size=(100, 2000)), dtype=np.float32)
adata = ad.AnnData(counts)
# create index names...observation names and variable names
adata.obs_names = [f"Cell_{i:d}" for i in range(adata.n_obs)]
adata.var_names = [f"Gene_{i:d}" for i in range(adata.n_vars)]
print(adata.obs_names[:10])

Index(['Cell_0', 'Cell_1', 'Cell_2', 'Cell_3', 'Cell_4', 'Cell_5', 'Cell_6',
       'Cell_7', 'Cell_8', 'Cell_9'],
      dtype='object')

After running the code above, the adata object created has a data matrix X attribute which essentially looks like the below:

This can also be observed in python by running the below which converts the aData object into a dataframe using to_df.

# outputs a dataframe version of the X matrix
adata.to_df()

	Gene_0	Gene_1	Gene_2	Gene_3	Gene_4	Gene_5	Gene_6	Gene_7	Gene_8	Gene_9	...	Gene_1990	Gene_1991	Gene_1992	Gene_1993	Gene_1994	Gene_1995	Gene_1996	Gene_1997	Gene_1998	Gene_1999
Cell_0	0.0	2.0	0.0	2.0	2.0	0.0	0.0	1.0	1.0	1.0	...	1.0	2.0	2.0	0.0	1.0	2.0	3.0	1.0	1.0	1.0
Cell_1	0.0	1.0	1.0	1.0	0.0	1.0	3.0	0.0	1.0	0.0	...	0.0	1.0	2.0	2.0	1.0	1.0	1.0	1.0	2.0	0.0
Cell_2	0.0	1.0	0.0	2.0	1.0	2.0	2.0	0.0	3.0	1.0	...	1.0	2.0	1.0	0.0	1.0	2.0	2.0	1.0	0.0	2.0
Cell_3	0.0	2.0	2.0	0.0	2.0	1.0	2.0	0.0	1.0	2.0	...	0.0	2.0	2.0	2.0	2.0	2.0	1.0	2.0	0.0	2.0
Cell_4	0.0	0.0	0.0	1.0	1.0	1.0	1.0	2.0	2.0	0.0	...	0.0	2.0	2.0	1.0	1.0	0.0	1.0	0.0	1.0	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
Cell_95	0.0	1.0	0.0	2.0	1.0	0.0	2.0	1.0	0.0	0.0	...	1.0	0.0	2.0	1.0	1.0	1.0	0.0	0.0	0.0	0.0
Cell_96	3.0	1.0	0.0	3.0	1.0	1.0	0.0	0.0	2.0	0.0	...	0.0	2.0	1.0	1.0	1.0	2.0	0.0	0.0	1.0	0.0
Cell_97	0.0	0.0	1.0	0.0	5.0	0.0	1.0	2.0	2.0	1.0	...	1.0	0.0	1.0	1.0	1.0	2.0	0.0	1.0	1.0	2.0
Cell_98	0.0	1.0	2.0	1.0	2.0	2.0	1.0	0.0	1.0	1.0	...	1.0	2.0	0.0	2.0	0.0	1.0	0.0	0.0	2.0	1.0
Cell_99	1.0	2.0	1.0	2.0	1.0	0.0	1.0	0.0	2.0	1.0	...	1.0	2.0	2.0	0.0	3.0	0.0	1.0	2.0	0.0	3.0

100 rows × 2000 columns

Now, let’s add in some annotations/metadata at the observation level. This could be the cell type of each observation.

# random cell assignment
ct = np.random.choice(["B", "T", "Monocyte"], size=(adata.n_obs,))
adata.obs["cell_type"] = pd.Categorical(ct)  # Categoricals are preferred for efficiency
adata.obs

	cell_type
Cell_0	Monocyte
Cell_1	T
Cell_2	B
Cell_3	T
Cell_4	B
...	...
Cell_95	B
Cell_96	Monocyte
Cell_97	B
Cell_98	B
Cell_99	B

100 rows × 1 columns

For metadata that has many dimensions (each cell could have a 2-dim UMAP mapping or each gene could have a 5-dim feature set), we can use the obsm and varm attributes as shown below.

# add n-dim metadata to variables
adata.obsm["X_umap"] = np.random.normal(0, 1, size=(adata.n_obs, 2))
adata.varm["gene_stuff"] = np.random.normal(0, 1, size=(adata.n_vars, 5))

The above is a very brief overview of the anndata object. For more information, see Getting started with anndata.

Let’s move on to some of the functions of scanpy.

# scanpy demo

First, let’s download and read in the demo 3k PBMC scRNA-seq data from 10X Genomics (company that provides services single-cell data generation and analysis).

# fetch the scanpy demo data
!mkdir data
!wget http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz -O data/pbmc3k_filtered_gene_bc_matrices.tar.gz
!cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz

# read in the data
# place cursor after first parantheses and push ctrl/cmd+shift+space bar to bring up docstrings
adata = sc.read_10x_mtx(
    'data/filtered_gene_bc_matrices/hg19/',  # the directory with the `.mtx` file
    var_names='gene_symbols',                # use gene symbols for the variable names (variables-axis index)
    cache=True)                              # write a cache file for faster subsequent reading

--2025-06-11 17:26:06--  http://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
Resolving cf.10xgenomics.com (cf.10xgenomics.com)... 104.18.0.173, 104.18.1.173, 2606:4700::6812:ad, ...
Connecting to cf.10xgenomics.com (cf.10xgenomics.com)|104.18.0.173|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz [following]
--2025-06-11 17:26:06--  https://cf.10xgenomics.com/samples/cell-exp/1.1.0/pbmc3k/pbmc3k_filtered_gene_bc_matrices.tar.gz
Connecting to cf.10xgenomics.com (cf.10xgenomics.com)|104.18.0.173|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7621991 (7.3M) [application/x-tar]
Saving to: ‘data/pbmc3k_filtered_gene_bc_matrices.tar.gz’

          data/pbmc   0%[                    ]       0  --.-KB/s               data/pbmc3k_filtere 100%[===================>]   7.27M  --.-KB/s    in 0.05s   

2025-06-11 17:26:06 (143 MB/s) - ‘data/pbmc3k_filtered_gene_bc_matrices.tar.gz’ saved [7621991/7621991]

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 8
      4 get_ipython().system('cd data; tar -xzf pbmc3k_filtered_gene_bc_matrices.tar.gz')
      6 # read in the data
      7 # place cursor after first parantheses and push ctrl/cmd+shift+space bar to bring up docstrings
----> 8 adata = sc.read_10x_mtx(
      9     'data/filtered_gene_bc_matrices/hg19/',  # the directory with the `.mtx` file
     10     var_names='gene_symbols',                # use gene symbols for the variable names (variables-axis index)
     11     cache=True)                              # write a cache file for faster subsequent reading

NameError: name 'sc' is not defined

scanpy - explore and filter data

Let’s first look at the most highly expressed 20 genes in our dataset:

sc.pl.highest_expr_genes(adata, n_top=20, )
# return the number of observations
print(adata.n_obs)
# return the number of variables
print(adata.n_vars)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 sc.pl.highest_expr_genes(adata, n_top=20, )
      2 # return the number of observations
      3 print(adata.n_obs)

NameError: name 'sc' is not defined

Now, let’s do some filtering for gene and cell representation.

sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
# return the number of observations
print(adata.n_obs)
# return the number of variables
print(adata.n_vars)
# looks like no cells were removed and 19,024 genes were removed
# add some zeros if expression below a certain level
# adata.X[adata.X < 0.3] = 0

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 sc.pp.filter_cells(adata, min_genes=200)
      2 sc.pp.filter_genes(adata, min_cells=3)
      3 # return the number of observations

NameError: name 'sc' is not defined

For brevity, the steps below filter out cells of poor quality (containing high proportions of mitochondrial genes) and also perform some normalization. See Preprocessing and clustering 3k PBMCs for more details. In the next section we will look at principal components and do some clustering to see if we can group cells with similar expression profiles.

# filter out poor quality cells
adata.var['mt'] = adata.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
adata = adata[adata.obs.n_genes_by_counts < 2500, :]
adata = adata[adata.obs.pct_counts_mt < 5, :]

# normalization
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata.raw = adata

# filter out highly variable genes
adata = adata[:, adata.var.highly_variable]
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
sc.pp.scale(adata, max_value=10)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 3
      1 # filter out poor quality cells
      2 adata.var['mt'] = adata.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'
----> 3 sc.pp.calculate_qc_metrics(adata, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
      4 adata = adata[adata.obs.n_genes_by_counts < 2500, :]
      5 adata = adata[adata.obs.pct_counts_mt < 5, :]

NameError: name 'sc' is not defined

scanpy - PCA and UMAP clustering

PCA stands for principal component analysis. principal components (PCs) are axes capturing variation in your data. They are often used to reduce the dimensionality of your dataset and can be used in machine learning/regression models. See A Step-By-Step Introduction to PCA for a more detailed overview. Let’s calculate the PCs and visualize the first two PCs highlighting CST3 expression.

# look at pcs to see how many pcs to use in neighborhood graph construction
sc.tl.pca(adata, svd_solver='arpack')
# pl ie plot just the first two principal components
sc.pl.pca(adata, color='CST3')

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 2
      1 # look at pcs to see how many pcs to use in neighborhood graph construction
----> 2 sc.tl.pca(adata, svd_solver='arpack')
      3 # pl ie plot just the first two principal components
      4 sc.pl.pca(adata, color='CST3')

NameError: name 'sc' is not defined

In the figure above, each dot is a cell plotted against the first two PCs. The color of the dot is correlated with CST3 expression. It looks like there are three or four different clusters just based on these PCs and CST3 expression level.

Now, let’s create an elbow plot which will plot variance captured vs each PC. This gives us an idea of which PCs to use in clustering (those that capture the most variance).

# note that this is a logarithmic scale of variance ratio
sc.pl.pca_variance_ratio(adata, log=True)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 2
      1 # note that this is a logarithmic scale of variance ratio
----> 2 sc.pl.pca_variance_ratio(adata, log=True)

NameError: name 'sc' is not defined

In order to perform clustering, we need to compute the neighborhood graph using and embed the graph in UMAP (Uniform Manifold Approximation and Projection) dimensions. Neighborhood graphs are first determined where nodes represent cells and lines indicate degrees of similarity between cells ie lines with greater weight indicate cells are more closely similar to each other.

Knowing this, we then embed the graph in UMAP dimensions. UMAP is another dimension reduction technique but is based on the idea that most high dimensional data lies in manifolds. We won’t go into much detail here regarding UMAP, but the below links are helpful to learn more:

# calculate neighborhood graph pp = preprocessing using the first 40 PCs
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)

# initial clustering...this part isn't in the official demo but I think they forgot this part
sc.tl.leiden(adata)
# remedy disconnected clusters...
sc.tl.paga(adata) # maps "coarse-grained connectivity structures of complex manifolds", tl = toolkit, paga = partition-based graph abstraction
sc.pl.paga(adata, plot=False)  # compute the course grained layout, pl = plot
sc.tl.umap(adata, init_pos='paga') # embed in umap

# embedding of neighborhood graph using UMAP
sc.tl.umap(adata)
sc.pl.umap(adata, color=['CST3', 'NKG7', 'PPBP'])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[13], line 2
      1 # calculate neighborhood graph pp = preprocessing using the first 40 PCs
----> 2 sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
      4 # initial clustering...this part isn't in the official demo but I think they forgot this part
      5 sc.tl.leiden(adata)

NameError: name 'sc' is not defined

Now, we can finally cluster the data using the Leiden graph-clustering method, which tries to detect communities of nodes.

Again, we won’t go into too much detail regarding this methods, but the below are helpful:

sc.tl.leiden(adata)
sc.pl.umap(adata, color=['leiden', 'CST3', 'NKG7'])

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 1
----> 1 sc.tl.leiden(adata)
      2 sc.pl.umap(adata, color=['leiden', 'CST3', 'NKG7'])

NameError: name 'sc' is not defined

In-class exercises

In-class exercise 1: From the AnnData section…instead of creating a csr_matrix can we create a pandas dataframe instead to look at the data more easily?

Answer:

pd.DataFrame(np.random.poisson(1, size=(100, 2000)))

	0	1	2	3	4	5	6	7	8	9	...	1990	1991	1992	1993	1994	1995	1996	1997	1998	1999
0	0	1	0	0	0	1	0	1	0	1	...	2	3	1	0	0	1	4	1	0	0
1	2	1	0	2	1	0	2	0	0	0	...	0	0	1	1	1	0	0	1	1	2
2	2	0	2	3	1	0	1	0	2	3	...	0	0	2	0	1	1	0	1	0	1
3	1	0	1	1	0	2	0	0	1	0	...	0	0	0	0	2	4	2	0	1	1
4	1	1	3	2	2	1	1	1	0	0	...	0	2	1	1	1	1	0	1	1	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
95	1	2	3	1	0	0	0	1	0	0	...	1	0	1	4	0	1	1	3	4	0
96	0	2	2	0	1	0	0	0	0	0	...	0	1	0	0	3	0	1	5	0	1
97	1	0	1	1	1	1	2	1	1	1	...	1	1	0	1	3	0	1	0	2	0
98	3	2	0	2	4	1	2	0	0	0	...	0	0	2	0	0	1	2	2	1	0
99	2	2	1	1	2	2	2	0	2	0	...	0	2	0	2	0	1	2	0	0	0

100 rows × 2000 columns

In-class exercise 2: Find the UMAP mappings for cell 5 in the adata object.

Answer:

adata.obsm["X_umap"][4]

array([ 0.03475369, -0.30595566])