Skip to content

Latest commit

 

History

History
63 lines (40 loc) · 1.87 KB

File metadata and controls

63 lines (40 loc) · 1.87 KB

Deterministic LLM Kernel Visualization

This repository demonstrates how floating-point non-associativity and reduction order can introduce subtle non-determinism in common LLM operations such as RMSNorm, matrix multiplication, and attention. The Jupyter notebook visualizes the effect of “split reductions,” simulating fused GPU kernels and showing how tiny rounding differences propagate through network layers.


Repository Structure

.
├── Examples.ipynb        # Jupyter notebook with simulations and heatmaps
├── pyproject.toml        # Project and dependency configuration for UV
├── uv.lock               # UV lockfile for reproducible environment
└── README.md             # This file

Installation

  1. Clone the repository:
git clone https://github.com/psmgeelen/DeterministicLLMs.git
cd DeterministicLLMs
  1. Install dependencies using UV:
uv sync

UV reads pyproject.toml and uv.lock to install all required packages automatically.


Usage

  1. Launch Jupyter:
jupyter notebook
  1. Open Examples.ipynb and run the cells to simulate RMSNorm, Matmul, and Attention with both normal and split reductions.

  2. Observe the heatmaps, which visualize tiny differences caused by rounding and reduction order changes.


Key Insights

  • Floating-point arithmetic is non-associative, so the order of operations affects results.
  • Even deterministic kernels can produce different outputs if batch size, sequence length, or kernel implementation varies.
  • Tiny differences caused by precision and rounding propagate through network layers, causing observable non-determinism.
  • Controlling reduction strategies is critical for reproducibility, benchmarking, and deterministic inference.

License

This repository is released under the UNLICENSE – do whatever you want.