Bacterial Integrative Single-Cell Optimal Transport
A computational framework for the integrative analysis of bacterial single-cell datasets, combining flow cytometry and single-cell RNA sequencing through Optimal Transport methods.
Key Features:
- Automated gating: High-dimensional gating without manual intervention
- GMM-OT alignment: Gaussian Mixture Model-based Optimal Transport for robust population matching
- Cross-modal imputation: k-nearest neighbor strategies for imputing gene expression onto flow cytometry data
- Temporal tracking: Track bacterial population dynamics over time (e.g., sporulation)
- Scalable: Designed for high-throughput cytometric platforms
- AnnData/MuData: Standard single-cell data structures for interoperability
git clone https://github.com/bio-datascience/biscot.git
cd biscot
pip install -e .Requirements:
- Python >= 3.12
- Key dependencies: POT (optimal transport), scikit-learn, mudata, pytometry, torch
from biscot import UniModalData, GMMConfig
# Create data container
data = UniModalData(
channels=["FSC", "SSC", "DAPI"],
analysis_mode="temporal",
reference_timepoint="t2"
)
# Add flow cytometry datasets
data.add_dataset("t0", adata_t0)
data.add_dataset("t1", adata_t1)
data.add_dataset("t2", adata_t2)
# Run analysis with automatic component selection
data.analyze(gmm_config=GMMConfig(n_components="bic", k_range=(5, 15)))
# Compute transport between timepoints
data.compute_transport("t0", "t1")
data.compute_transport("t1", "t2")from biscot import MultiModalData, GMMConfig, OTConfig
# Create multimodal container
data = MultiModalData()
# Add modalities
data.add_modality("flow", flow_adata, modality_type="flow")
data.add_modality("rna", rna_adata, modality_type="rna")
# Fit GMMs
data.analyze(gmm_config=GMMConfig(n_components=10))
# Align modalities via PCA + Procrustes
data.align_modalities("flow", "rna")
# Compute optimal transport
data.compute_transport("flow", "rna", OTConfig(method="bary"))
# Impute gene expression onto flow cells
data.impute_features(
source_modality="rna",
target_modality="flow",
features=["groEL", "ftsZ", "spoIVA"]
)For simpler workflows, use the unified API:
from biscot import analyze_biscot, GMMConfig
# One-line temporal analysis
results = analyze_biscot(
data={"t0": adata0, "t1": adata1, "t2": adata2},
mode="unimodal",
analysis_type="temporal",
gmm_config=GMMConfig(n_components="bic"),
plot_results=True,
output_dir="output/"
)
# Access results
similarity_matrix = results.similarity_matrix
results.plot_summary()
results.export("output/")- Format: FCS v2.0 or v3.0
- Preprocessing: Compensated, singlet-gated, live-cell-gated recommended
- NOT log-transformed (Biscot handles transformation)
- Minimum: 10,000+ events per file recommended
from biscot import load_fcs, PreprocessingConfig
# Load single FCS file
adata = load_fcs(
"sample.fcs",
preprocessing_config=PreprocessingConfig(
channels_to_select=["FSC", "SSC", "DAPI"],
apply_log_transform=True
)
)
# Load batch of FCS files
from biscot import load_fcs_batch
adatas, metadata = load_fcs_batch(
file_paths={"t0": "t0.fcs", "t1": "t1.fcs"},
channels=["FSC", "SSC"],
preprocessing_config=PreprocessingConfig(apply_log_transform=True)
)- Format: AnnData with PCA coordinates in
.obsm['X_pca'] - Required: Gene expression accessible (in
.Xor.uns['original_expression'])
import scanpy as sc
# Ensure PCA is computed
sc.pp.pca(rna_adata, n_comps=3)
assert 'X_pca' in rna_adata.obsm| Class | Purpose |
|---|---|
UniModalData |
Single-modality flow cytometry analysis (temporal tracking, similarity) |
MultiModalData |
Multi-modal integration (flow + RNA-seq) |
| Class | Purpose |
|---|---|
GMMConfig |
GMM parameters: n_components, covariance_type, BIC selection |
OTConfig |
Optimal transport: method ("bary", "emd", "sinkhorn"), epsilon |
GateDefinition |
Manual gate polygons for cell filtering |
PreprocessingConfig |
Data preprocessing: channel selection, transforms, filtering |
PaddingConfig |
Dimension padding for mismatched modalities |
Data Loading:
load_fcs()- Load single FCS fileload_fcs_batch()- Load multiple FCS filesexport_to_fcs()- Export AnnData to FCS
Analysis:
temporal_analysis()- Run temporal tracking workflowsimilarity_analysis()- Compute pairwise sample similaritiescross_modal_mapping()- Cross-modal feature imputation
Model Selection:
select_gmm_components_bic()- Automatic BIC-based component selection
Visualization:
plot_fcm_gates()- Flow cytometry gate visualizationplot_clusters()- Cluster scatter plotsplot_imputed_expression()- Gene expression heatmapsplot_temporal_tracking()- Temporal population trackingplot_similarity_matrix()- Sample similarity heatmap
| Notebook | Description |
|---|---|
| 01_wrapper_api.ipynb | Complete cross-modal workflow (recommended start) |
| 02_full_tutorial.ipynb | Detailed manual API tutorial |
| Notebook | Description |
|---|---|
| 01_temporal_automated.ipynb | Automated population tracking over time |
| 02_temporal_with_gates.ipynb | Temporal analysis with manual gates |
| 03_3d_tessellation.ipynb | 3D tessellation analysis |
| 04_spatial_biofilm.ipynb | Spatial biofilm analysis |
| 05_mixture_analysis.ipynb | Mixture similarity analysis |
| Notebook | Description |
|---|---|
| 01_baseline_2d.ipynb | 2D baseline cross-modal integration |
| 02_dimension_padding.ipynb | Handling dimension mismatches |
| 03_padding_evaluation.ipynb | Padding method comparison |
gmm_config = GMMConfig(n_components=10)# Search range (5-15 components)
gmm_config = GMMConfig(n_components="bic", k_range=(5, 15))
# With elbow detection
gmm_config = GMMConfig(
n_components="bic",
k_range=(5, 20),
bic_use_elbow=True,
bic_replicates=3
)# Full covariance (default, most flexible)
GMMConfig(n_components=10, covariance_type="full")
# Diagonal (faster, less parameters)
GMMConfig(n_components=10, covariance_type="diag")
# Spherical (fastest, equal variance in all directions)
GMMConfig(n_components=10, covariance_type="spherical")Define polygon gates for cell filtering:
from biscot import GateDefinition, UniModalData
# Define gates
gates = [
GateDefinition(
label="Population_A",
coords=[(0.5, 0.5), (0.5, 2.0), (2.0, 2.0), (2.0, 0.5)],
coordinate_space="log"
),
GateDefinition(
label="Population_B",
coords=[(2.0, 1.0), (2.0, 3.0), (4.0, 3.0), (4.0, 1.0)],
coordinate_space="log"
),
]
# Apply gates
data = UniModalData(channels=["FSC", "DAPI"], gates=gates)
data.add_dataset("sample1", adata)
data.apply_gates()"KeyError: 'X_pca'" (RNA data)
import scanpy as sc
sc.pp.pca(rna_adata, n_comps=3)"ValueError: shape mismatch" (dimension mismatch)
# Use dimension padding for mismatched modalities
data.impute_missing_dimensions("flow", "rna", n_components=10)"GMM did not converge"
# Try fewer components or diagonal covariance
gmm_config = GMMConfig(n_components=5, covariance_type="diag")Data already log-transformed
# Check max value - if < 10, already transformed
import numpy as np
print(f"Max: {np.max(adata.X)}") # Should be > 10,000 for raw data
# Disable log transform
PreprocessingConfig(apply_log_transform=False)Raw Data (FCS/H5AD)
|
v
Preprocessing (log transform, channel selection)
|
v
Automated Gating / GMM Clustering (identify populations)
|
v
GMM-OT Alignment (match populations via Optimal Transport)
|
v
KNN Imputation (transfer features across modalities)
|
v
Results (imputed gene expression, population tracking, similarity matrices)
Core Methodology (GMM-OT):
- Fit Gaussian Mixture Models to represent cell populations in each sample/modality
- Use Optimal Transport on GMM components for robust population alignment
- Apply k-nearest neighbor strategies to impute gene expression onto flow cytometry data
- Track population dynamics across time points or experimental conditions
If you use biscot in your research, please cite:
@article{biscot2025,
title={biscot: an Optimal Transport framework for multimodal and unimodal bacterial single-cell data analysis},
author={Feldl et. al.},
journal={},
year={2025}
}# Clone and install in development mode
git clone https://github.com/bio-datascience/biscot.git
cd biscot
uv sync
# Run tests
uv run pytest
# Lint (with import sorting)
uv run ruff check --select I --fix
# Format code
uv run ruff format
# Build documentation
uv run mkdocs build
# Serve documentation locally
uv run mkdocs serve