generative-computing · aviv1ron1 · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
@@ -80,21 +80,23 @@ granite-switch/
 
 ## Installation (local/dev)
 
+This project uses [uv](https://docs.astral.sh/uv/getting-started/installation/).
+
 ```bash
 # Core package only (config)
-pip install -e .
+uv sync
 
 # With HuggingFace backend
-pip install -e ".[hf]"
+uv sync --extra hf
 
 # With vLLM backend
-pip install -e ".[vllm]"
+uv sync --extra vllm
 
 # With compose tools
-pip install -e ".[compose]"
+uv sync --extra compose
 
 # Everything (development)
-pip install -e ".[dev]"
+uv sync --extra dev
 ```
 
 ## Import Paths

@@ -9,10 +9,10 @@ Thank you for your interest in contributing to Granite Switch!
    ```bash
    git clone https://github.com/<your-username>/granite-switch.git
    cd granite-switch
-   pip install -e ".[dev]"
+   uv sync --extra dev
    ```
 3. Create a feature branch and make your changes
-4. Run tests: `pytest tests/ -v`
+4. Run tests: `uv run pytest tests/ -v`
 5. Submit a pull request
 
 ## Contribution Guidelines

@@ -1,7 +1,7 @@
 # Granite Switch — Build AI models like you build software
 
 [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)
-[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)
+[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
 
 | [**Browse Adapters**](https://huggingface.co/collections/ibm-granite/granite-libraries) | [Models on HF](https://huggingface.co/ibm-granite/granite-switch-4.1-8b-preview) | [Tutorials](tutorials/README.md) |
 
@@ -20,19 +20,28 @@ Browse available libraries in the [Granite Libraries collection](https://hugging
 
 ### Install
 
+This project uses [uv](https://docs.astral.sh/uv/getting-started/installation/) for dependency management. Install uv first (one-time setup), then:
+
 ```bash
-python -m venv venv && source venv/bin/activate
-
-# Granite-Switch installation is based on your usecase:
-pip install "granite-switch[compose]"   # Compose modular models
-pip install "granite-switch[hf]"        # HuggingFace inference
-pip install "granite-switch[vllm]"      # vLLM production inference (0.19.x)
-pip install "granite-switch[vllm20]"    # vLLM 0.20+ (requires CUDA 13+)
-pip install "granite-switch[dev]"       # Everything (uses vLLM 0.19.x by default)
-pip install "granite-switch[dev-vllm20]" # Dev environment with vLLM 0.20+
+git clone https://github.com/generative-computing/granite-switch.git
+cd granite-switch
+uv sync
 ```
 
-Requires Python 3.9+ and PyTorch 2.0+.
+Then add the extra for your use case:
+
+| Extra | Command | Use case |
+|-------|---------|----------|
+| `compose` | `uv sync --extra compose` | Compose modular models |
+| `hf` | `uv sync --extra hf` | HuggingFace inference |
+| `vllm` | `uv sync --extra vllm` | vLLM inference (CUDA 12.x) |
+| `vllm20` | `uv sync --extra vllm20` | vLLM 0.20+ (CUDA 13+) |
+| `dev` | `uv sync --extra dev` | Full dev environment (CUDA 12.x) |
+| `dev-vllm20` | `uv sync --extra dev-vllm20` | Full dev environment (CUDA 13+) |
+
+Requires Python 3.10+ and PyTorch 2.0+.
+
+> **Installing from PyPI instead?** Use `pip install "granite-switch[hf]"` or `uv pip install "granite-switch[hf]"` (swap `hf` for any extra above).
 
 > **vLLM version note:** This project currently defaults to vLLM 0.19.1 due to vLLM 0.20's
 > dependency on CUDA 13.0+ (via PyTorch 2.11), which is incompatible with many existing
@@ -62,10 +71,18 @@ For convenience, you can find already composed Granite Switch models for the Gra
 
 ### Run Inference
 
+> **Tip: pre-download the model for faster startup.** The first run will download several GB from Hugging Face, which can be slow. To download in advance using the fast transfer backend:
+> ```bash
+> uv pip install huggingface_hub[hf_transfer]
+> huggingface-cli login                          # one-time, if not already logged in
+> HF_HUB_ENABLE_HF_TRANSFER=1 hf download ibm-granite/granite-switch-4.1-3b-preview
+> ```
+> Subsequent runs will use the local cache automatically.
+
 **vLLM + Mellea (recommended):**
 
 ```bash
-pip install mellea
+uv pip install mellea
 # Example with the 3B model 
 python -m vllm.entrypoints.openai.api_server --model ibm-granite/granite-switch-4.1-3b-preview --port 8000
 ```

@@ -62,7 +62,7 @@ Fixes #123
 
 Before committing:
 
-1. **Run tests**: `pytest tests/ -v`
+1. **Run tests**: `uv run pytest tests/ -v`
 2. **Check comments match code** — stale comments are worse than no comments
 3. **Update docs** if behavior changed
 

@@ -70,6 +70,14 @@ conflicts = [
         { extra = "dev-vllm20" },
         { extra = "vllm" },
     ],
+    [
+        { extra = "tutorials" },
+        { extra = "vllm20" },
+    ],
+    [
+        { extra = "tutorials" },
+        { extra = "dev-vllm20" },
+    ],
 ]
 
 [tool.setuptools.packages.find]

@@ -19,30 +19,34 @@ Python 3.10+ is required.
 
 ### Base Installation
 
+Install [uv](https://docs.astral.sh/uv/getting-started/installation/), then:
+
 ```bash
-pip install granite-switch
+git clone https://github.com/generative-computing/granite-switch.git
+cd granite-switch
+uv sync
 ```
 
 ### HuggingFace Backend
 
 For direct model inference with HuggingFace Transformers:
 
 ```bash
-pip install "granite-switch[hf,compose]"
+uv sync --extra hf
 ```
 
 This includes:
 - `transformers` for model loading and generation
 - `torch` with CUDA support
-- `peft` for LoRA operations
 - Compose tools for model building
 
 ### vLLM Backend
 
 For production inference with vLLM:
 
 ```bash
-pip install "granite-switch[vllm]"
+uv sync --extra vllm         # CUDA 12.x
+uv sync --extra vllm20       # CUDA 13+ (requires PyTorch 2.11+)
 ```
 
 This includes:
@@ -54,15 +58,15 @@ This includes:
 Mellea provides high-level intrinsic functions for adapter invocation:
 
 ```bash
-pip install mellea
+uv pip install mellea
 ```
 
 ### Notebook Dependencies
 
 For running Jupyter notebooks:
 
 ```bash
-pip install jupyter chromadb tqdm httpx python-dotenv
+uv pip install jupyter chromadb tqdm httpx python-dotenv
 ```
 
 ## Model Access

@@ -26,8 +26,8 @@ The notebook runs both servers sequentially on a single A100 GPU and produces
 - Two GPUs (one per server) for simultaneous mode, or one GPU for sequential mode
 - Install dependencies:
   ```bash
-  pip install -e ".[vllm]"
-  pip install mellea chromadb rich tqdm transformers httpx
+  uv sync --extra vllm
+  uv pip install mellea chromadb rich tqdm transformers httpx
   ```
 - Build the ChromaDB index (once):
   ```bash

@@ -39,7 +39,7 @@ See [PREREQUISITES.md](../PREREQUISITES.md) for detailed setup instructions.
 
 ```bash
 # Install Mellea from source
-pip install "git+https://github.com/generative-computing/mellea.git@main"
+uv pip install "git+https://github.com/generative-computing/mellea.git@main"
 ```
 
 ## Quick Example