# Clone the repository
git clone https://github.com/momalekpour/vortosql.git
cd vortosql
# Install dependencies (uv manages the virtualenv and Python version automatically)
# Install uv first if needed: https://docs.astral.sh/uv/getting-started/installation/
uv sync --all-extras
# (Optional) Install pre-commit hooks for auto linting (ruff) and formatting (black) for development
uv run pre-commit install# Copy the example env file and set your OPENAI_API_KEY
cp .env.example .env
# Web UI (recommended)
bash scripts/run_ui.sh
# CLI REPL
bash scripts/run_cli.shSee docs/architecture.md for full details. The application is built around a composable operator pipeline configured via config.yaml — each operator's model provider, technique, and behaviour is plug-and-play; the default setup is ready to run as-is. Each operator implements an execute(context) method that reads from and writes to a shared context dictionary. The pipeline runs the following operators in order:
- IntentGuardrail - LLM-based scope classifier; rejects out-of-scope questions via early-stop
- SchemaLinker - Resolves which tables/columns are relevant to the question
- ExampleSelector - Retrieves similar few-shot examples (skipped in zero-shot mode)
- SQLGenerator - LLM generates a SQL query from the question, schema, and examples
- SQLCorrector - Validates and auto-corrects SQL errors via retry loop
- SQLExecutor - Executes the final SQL; enforces department guardrails via AST-level injection
- AnswerGenerator - LLM summarises the query results into a natural language answer
Department is locked at session start and enforced across five independent layers:
- Layer 0 — Intent gate (IntentGuardrail, soft): LLM checks if the question is in-domain; triggers early-stop if not.
- Layer 1 — Schema restriction (SchemaLinker, hard):
schema_guardrailshides tables/columns from the LLM entirely. - Layer 2 — Prompt constraint (SQLGenerator, soft):
row_guardrailsare rendered as mandatory filter instructions in the prompt. - Layer 3a — Direct AST injection (SQLExecutor, hard): sqlglot injects a missing
WHEREon guardrailed columns (e.g.,Employee.Department = 'Engineering'). - Layer 3b — FK-aware AST injection (SQLExecutor, hard): sqlglot injects a subquery filter for child tables queried without a JOIN (e.g.,
Certification.EmployeeId IN (SELECT ...)).
Each pipeline run dumps its full config and context to logs/<timestamp>.json for traceability.