Token Factory in-pod shim

The pod-side runner for Saturn Cloud's Token Factory product — a no-code LoRA fine-tuning service.

This package is the tiny shim that runs inside a Token Factory fine-tune job pod. All training logic lives in axolotl; this shim's job is to:

Read a rendered axolotl YAML from ~/axolotl-config.yaml (overridable via TF_AXOLOTL_CONFIG_PATH).
Run axolotl train <config>, tee'ing its stdout/stderr to both k8s logs and <output_dir>/training.log.
On success, parse <output_dir>/trainer_state.json, verify an adapter was written, and register the result as a kind=checkpoint Artifact via the Atlas API.
On failure, best-effort register an error Artifact.

Atlas owns the YAML rendering. Re-runs are submitted as fresh AI Studio resources (no in-place retry); resume-from-checkpoint is a future Atlas-side feature (the renderer sets lora_model_dir: to point at a previous run's output).

Environment

Var	Required	Description
`SATURN_RESOURCE_ID`	for callback	The pod's deployment id. Used as `producer.id` and as the Atlas idempotency key.
`SATURN_TOKEN`	for callback	Bearer JWT for Atlas.
`SATURN_BASE_URL`	for callback	Atlas base URL.
`TF_OUTPUT_SF_ID`	for callback	SharedFolder id backing the pod's output mount.
`TF_OUTPUT_SF_RELPATH`	for callback	Relative path within that SharedFolder where this job's output lives.
`TF_AXOLOTL_CONFIG_PATH`	no	Path to the rendered axolotl YAML. Defaults to `~/axolotl-config.yaml`.
`TF_IMAGE_TAG`	no	Image tag string echoed into the artifact metadata.

If the SATURN_* env vars aren't set the shim still runs axolotl, but skips the Atlas callback (useful for local dev).

Atlas contract

Single artifact registration per resource. Idempotency key is SATURN_RESOURCE_ID; pod restarts for the same resource dedupe server-side.
Two-call: create the artifact (server stamps status=pending), then PATCH to ready or error. Create is retried (5 attempts, jittered backoff); PATCH is best-effort (2 attempts).
Hard failures (pod killed, OOM, eviction) produce no callback — Atlas's job reconciler converges the artifact row from k8s pod state.

The wire format and retry policy live in saturn_tokenfactory/atlas_client.py.

Exit codes

Code	Meaning
0	axolotl succeeded; artifact registered.
1	axolotl exited non-zero, or its outputs were incomplete. Best-effort error artifact registered.
2	Config error (missing env, missing/invalid YAML).

Local development

make conda-update      # set up the conda env
make format-backend    # black + isort
make lint-backend      # black/isort/flake8/mypy
make test-backend      # pytest

To run end-to-end against a tiny dataset, set the env vars manually, drop an axolotl YAML at ~/axolotl-config.yaml, and run python -m saturn_tokenfactory.

Out of scope

The shim doesn't know about:

Dataset formats — axolotl handles those via the YAML.
Model families / LoRA target_modules / per-family quirks — handled by axolotl + Atlas's renderer.
Experiment trackers (MLflow / W&B / Comet) — configure via the YAML; axolotl talks to them directly. The shim writes a tag-based deep-link is left to the UI (search by tags.saturn.resource_id).
Multi-GPU / multi-node launching — the rendered YAML carries the right DeepSpeed/FSDP config and the pod's launch command picks the right torchrun invocation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
saturn_tokenfactory		saturn_tokenfactory
tests		tests
.flake8		.flake8
.gitignore		.gitignore
DESIGN.md		DESIGN.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Token Factory in-pod shim

Environment

Atlas contract

Exit codes

Local development

Out of scope

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Token Factory in-pod shim

Environment

Atlas contract

Exit codes

Local development

Out of scope

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages