NVIDIA
diff --git a/‎examples/speculative_decoding/guides/CR2_eagle_config.json‎
Lines changed: 15 additions & 0 deletions b/‎examples/speculative_decoding/guides/CR2_eagle_config.json‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎examples/speculative_decoding/guides/nemotron_mapping.bin‎
350 KB b/‎examples/speculative_decoding/guides/nemotron_mapping.bin‎
350 KB
diff --git a/‎examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb‎
Lines changed: 260 additions & 0 deletions b/‎examples/speculative_decoding/guides/train_eagle_head_cosmos_reason2.ipynb‎
Lines changed: 260 additions & 0 deletions
@@ -0,0 +1,15 @@
+{
+    "draft_vocab_size": 32000,
+    "initializer_range": 0.02,
+    "rms_norm_eps": 1e-06,
+    "_attn_implementation": "flex_attention",
+    "rope_scaling": {
+        "beta_fast": 32.0,
+        "beta_slow": 1.0,
+        "factor": 32.0,
+        "original_max_position_embeddings": 8192,
+        "rope_type": "yarn",
+        "truncate": false
+    },
+    "rope_theta": 150000
+}
@@ -0,0 +1,260 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Training an EAGLE3 Draft Head for Cosmos-Reason2\n",
+        "\n",
+        "This notebook walks through the full workflow for training an EAGLE3 speculative-decoding draft head on top of [nvidia/Cosmos-Reason2-8B](https://huggingface.co/nvidia/Cosmos-Reason2-8B).\n",
+        "\n",
+        "**Workflow overview**\n",
+        "\n",
+        "| Step | Description |\n",
+        "| :---: | :--- |\n",
+        "| 1 | Install dependencies |\n",
+        "| 2 | Authenticate with Hugging Face |\n",
+        "| 3 | Prepare training data from the Nemotron dataset |\n",
+        "| 4 | Calibrate the draft vocabulary |\n",
+        "| 5 | Launch training |\n",
+        "| 6 | Export checkpoint for deployment |\n",
+        "\n",
+        "> **Hardware requirement** – Cosmos-Reason2-8B requires at least one 80 GB GPU (e.g. H100/A100).\n",
+        "> Multi-GPU training is supported automatically via FSDP2 when more than one GPU is available."
+      ],
+      "id": "efe23925"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 1 – Install Dependencies"
+      ],
+      "id": "e64d39b5"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "pip install -U nvidia-modelopt[hf]\n",
+        "pip install -r ../requirements.txt"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "f0049171"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 2 – Authenticate with Hugging Face\n",
+        "\n",
+        "Both `nvidia/Cosmos-Reason2-8B` and `nvidia/Nemotron-Post-Training-Dataset-v2` require accepting\n",
+        "their licence agreements on the Hub. Run the cell below and follow the interactive prompt to log in:"
+      ],
+      "id": "fe68982a"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "hf auth login"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "b62417b6"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 3 – Prepare Training Data\n",
+        "\n",
+        "We use a curated subset of [nvidia/Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2)\n",
+        "(chat split) for training.  The `nemotron_mapping.bin` file (bundled alongside this notebook) selects the specific rows to use.\n",
+        "It stores 0-based dataset row indices as packed `int32` values (little-endian, produced by `numpy.ndarray.tofile`).\n",
+        "\n",
+        "The script streams only the required parquet shards and writes a conversation file in the\n",
+        "standard `jsonl` format expected by `launch_train.sh`."
+      ],
+      "id": "cdd4d470"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "python ../prepare_input_conversations/add_nemotron_chat.py \\\n",
+        "    --mapping-file nemotron_mapping.bin"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "32259e23"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "# Expect exactly 89511 conversations.\n",
+        "count=$(wc -l < input_conversations/nemotron-chat.jsonl)\n",
+        "echo \"${count} conversations in ../input_conversations/nemotron-chat.jsonl\"\n",
+        "[ \"$count\" -eq 89511 ] || { echo \"ERROR: expected 89511, got ${count}\"; exit 1; }"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "d05b97d3"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 4 – Calibrate the Draft Vocabulary\n",
+        "\n",
+        "`CR2_eagle_config.json` sets `\"draft_vocab_size\": 32000`.  Using a compressed vocabulary\n",
+        "speeds up training and inference, but requires a one-time calibration step that produces a\n",
+        "token-mapping file (`d2t.pt`)."
+      ],
+      "id": "09717fcc"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "python ../scripts/calibrate_draft_vocab.py \\\n",
+        "    --model nvidia/Cosmos-Reason2-8B \\\n",
+        "    --data input_conversations/nemotron-chat.jsonl \\\n",
+        "    --draft_vocab_size 32000 \\\n",
+        "    --save_dir draft_vocab_cache"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "388f6897"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 5 – Train the EAGLE3 Draft Head\n",
+        "\n",
+        "Training is launched via `launch_train.sh`, which internally calls `accelerate launch main.py`\n",
+        "and sets up FSDP2 automatically when multiple GPUs are available.\n",
+        "\n",
+        "Key arguments used for Cosmos-Reason2:\n",
+        "\n",
+        "| Argument | Value | Notes |\n",
+        "| :--- | :--- | :--- |\n",
+        "| `--model` | `nvidia/Cosmos-Reason2-8B` | Target VLM |\n",
+        "| `--data` | `guides/input_conversations/nemotron-chat.jsonl` | Training conversations |\n",
+        "| `--eagle_config` | `guides/CR2_eagle_config.json` | Draft-head architecture |\n",
+        "| `--draft_vocab_cache` | `guides/draft_vocab_cache/Cosmos-Reason2-8B/d2t.pt` | Token-mapping from Step 4 |\n",
+        "| `--vlm_processor` | `nvidia/Cosmos-Reason2-8B` | VLM image processor |\n",
+        "| `--vlm_img_dir` | `data/` | Directory containing referenced images |\n",
+        "| `--training_seq_len` | `16384` | Max token length per sample (lower to save GPU memory or speed up training) |\n",
+        "| `--lr` | `1.5e-4` | Learning rate |\n",
+        "| `--num_epochs` | `20` | Training epochs |\n",
+        "| `--train_bs` | `1` | Per-device batch size |\n",
+        "| `--save_steps` | `1000` | Checkpoint frequency |\n",
+        "| `--ar_validate_steps` | `1000000` | Effectively disables in-training AR validation |\n",
+        "\n",
+        "> **Tip** – Set `--ar_validate_steps` to a smaller value (e.g. `500`) to periodically measure\n",
+        "> acceptance rate on MT-Bench during training."
+      ],
+      "id": "336c43b9"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "export WANDB_MODE=disabled\n",
+        "OUTPUT_DIR=ckpts/cosmos-reason2-8b-eagle3\n",
+        "EAGLE_CONFIG=guides/CR2_eagle_config.json\n",
+        "DRAFT_VOCAB_CACHE=guides/draft_vocab_cache/Cosmos-Reason2-8B/d2t.pt\n",
+        "\n",
+        "\n",
+        "# 20 epochs on 89k samples (4xB100): ~24 hours.\n",
+        "cd ..; OUTPUT_DIR=$OUTPUT_DIR ./launch_train.sh \\\n",
+        "  --model nvidia/Cosmos-Reason2-8B \\\n",
+        "  --output_dir $OUTPUT_DIR \\\n",
+        "  --data guides/input_conversations/nemotron-chat.jsonl \\\n",
+        "  --lr 1.5e-4 \\\n",
+        "  --num_epochs 20 \\\n",
+        "  --train_bs 1 \\\n",
+        "  --eagle_config $EAGLE_CONFIG \\\n",
+        "  --draft_vocab_cache $DRAFT_VOCAB_CACHE \\\n",
+        "  --training_seq_len 16384 \\\n",
+        "  --save_steps 1000 \\\n",
+        "  --ar_validate_steps 1000000 \\\n",
+        "  --vlm_processor nvidia/Cosmos-Reason2-8B \\\n",
+        "  --vlm_img_dir data/"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "0380f773"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Step 6 – Export Checkpoint for Deployment\n",
+        "\n",
+        "After training completes, convert the ModelOpt checkpoint to the Hugging Face–compatible\n",
+        "format expected by vLLM.  Point `--model_path` to the desired checkpoint subdirectory\n",
+        "(e.g. `checkpoint-110000`)."
+      ],
+      "id": "98e0f8c4"
+    },
+    {
+      "cell_type": "code",
+      "metadata": {},
+      "source": [
+        "%%bash\n",
+        "CKPT_DIR=ckpts/cosmos-reason2-8b-eagle3/checkpoint-110000\n",
+        "EXPORT_PATH=export/cosmos-reason2-8b-eagle3\n",
+        "\n",
+        "python scripts/export_hf_checkpoint.py \\\n",
+        "    --model_path $CKPT_DIR \\\n",
+        "    --export_path $EXPORT_PATH"
+      ],
+      "execution_count": null,
+      "outputs": [],
+      "id": "63880f67"
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Deployment\n",
+        "\n",
+        "The exported checkpoint can be served directly with **vLLM**:\n",
+        "\n",
+        "```bash\n",
+        "vllm serve nvidia/Cosmos-Reason2-8B \\\n",
+        "    --host 0.0.0.0 \\\n",
+        "    --port 8000 \\\n",
+        "    --speculative_config '{\"method\": \"eagle3\", \"model\": \"export/cosmos-reason2-8b-eagle3\", \"num_speculative_tokens\": 3}'\n",
+        "```\n",
+        "\n",
+        "Refer to the [vLLM speculative decoding docs](https://docs.vllm.ai/en/latest/features/spec_decode/) for the full list of options."
+      ],
+      "id": "413c4275"
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.10.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}