Skip to content

Dstack-TEE/private-ai-gateway

Repository files navigation

private-ai-gateway

Developer-preview Rust implementation of the Attested Confidential Inference (ACI) gateway service.

The protocol it speaks is the draft ACI specification proposed in Dstack-TEE/dstack#694. This repo is the workload git-launcher can fetch, install, and run inside a dstack v2 application VM.

The next architecture target is documented in docs/frontend-middleware-backend.md: one gateway process owns the downstream ACI frontend and verified-provider backend, with an optional plaintext HTTP-over-UDS middleware slot for routing and business logic. Middleware developers should start with docs/middleware-integration.md.

Status

0.1.0 - developer preview. Production-blocking work is explicit below.

Surface Status
Canonical JSON (RFC 8785 subset) done
Workload identity / keyset digests done
Attestation report (assembly + endorsement) done
Inference receipts (event log, signing) done
Non-streaming POST /v1/chat/completions forwarding done
POST /v1/completions forwarding done
Streaming chat/completions forwarding done
GET /v1/attestation/report done
GET /v1/receipt/{chat_id} done
GET /v1/signature/{chat_id} alias done
Upstream verification fail-closed by default done
ECDSA-secp256k1 65-byte recoverable receipt sig done
Receipt owner auth + retained body endpoint done (in-memory; retention defaults to 0)
E2EE-header fail-closed guard done
dstack SDK quoter over HTTP(S) or Unix socket done
dstack KMS-backed identity + receipt + E2EE keys done
Client-facing E2EE v2 termination done for chat/completions
Client-facing E2EE v2 streaming done for chat/completions
vLLM-proxy-compatible ECDSA v1/v2 and Ed25519/X25519 E2EE done for chat/completions
/v1/models upstream proxying done
/v1/metrics gateway-owned Prometheus metrics done
Runtime upstream config file + admin API done
Per-upstream verifier done for ACI/DCAP, Tinfoil, NEAR AI gateway, and Chutes E2EE-key bindings
Chutes provider transport done for buffered and streaming E2EE over /e2e/invoke
Frontend/middleware/backend framework partial; runtime UDS middleware mode wired
Public receipt log not done
Replica-stable identity (KMS-released keys) done for configured dstack key paths

The binary has no ephemeral-key or stub-quote startup path. It loads identity, receipt-signing, and E2EE keys from dstack KMS through the Rust dstack SDK, and it uses the same SDK for TDX quotes.

Layout

src/
  lib.rs
  main.rs               // binary entrypoint
  dstack.rs             // dstack SDK KMS key provider + quote provider
  aci/
    canonical.rs        // JCS subset, UTF-16 key sort, sha256 helpers
    types.rs            // wire structs (WorkloadKeyset, Receipt, ...)
    identity.rs         // workload_id, keyset digest, report_data
    keys.rs             // KeyProvider / Quoter traits and signature verifiers
    receipt.rs          // ReceiptBuilder + signing-bytes function
    upstream.rs         // UpstreamBackend trait + OpenAI-compatible client
  aggregator/
    service.rs          // AciService: report, forward, receipt store
  http/
    app.rs              // axum router for the ACI/OpenAI-compatible endpoints

entrypoint.sh           // gateway-owned entry script the launcher exec's
scripts/
  phala_multi_upstream_smoke.sh // deploys two upstream ACI CVMs + one gateway CVM and asserts routing receipts
deploy/                 // launcher .conf and dstack compose example
  README.md             // launcher wiring and deployment notes

tests/
  canonical.rs          // JCS stability, UTF-16 sort, float rejection
  identity.rs           // workload_id excludes subject, keyset digest includes it
  receipt.rs            // event ordering, finalization, signing bytes
  ecdsa_recoverable.rs  // §9.4 65-byte recoverable, reject 64-byte, no double hash
  service.rs            // fail-closed defaults, X-Upstream-Verification: none
  http.rs               // end-to-end report / chat / receipt
  aggregator_scenarios.rs // current no-middleware gateway happy/error path scenarios
  auth_and_retention.rs // receipt owner auth, retained body expiry, ACI headers
  aci_service_surface.rs // implemented surfaces plus ignored future specs
  entrypoint.rs         // shellcheck-lints and shape-checks entrypoint.sh
  smoke_scripts.rs      // shellcheck + invariant checks for scripts/

Launcher wiring (entrypoint.sh)

This repo is designed to be launched by git-launcher. The launcher pulls the repo at a pinned commit, cds into the public gateway repo root, and runs the gateway-owned entrypoint.sh. Non-secret runtime config is passed through normal Docker Compose environment: entries; secrets should come from dstack encrypted secrets, KMS, or mounted secret files.

Ownership boundary. The launcher is generic and build-system agnostic; it does not know we are written in Rust. entrypoint.sh is owned by this gateway, and everything past bash entrypoint.sh — install, build, run — lives here. The launcher config stays minimal (REPO_URL, COMMIT_SHA, WORK_DIR); there is no INSTALL_CMD and no RUN_CMD.

What entrypoint.sh does (once the launcher invokes it):

  1. If cargo is not on PATH, this gateway installs a Rust toolchain via apt-get install -y --no-install-recommends ca-certificates rustup
    • rustup default stable. This is a gateway implementation choice for the first slice, not a launcher capability. Production should publish a Rust-capable gateway image (see deploy/README.md pattern B) so the toolchain is covered by a gateway-owned image digest.
  2. Runs cargo build --release --locked --bin private-ai-gateway. The --locked flag means a build that would change Cargo.lock is a hard failure, not silent dependency drift. Cargo, Rustup, and build target state live under /var/lib/private-ai-gateway/cache by default, outside the source checkout that git-launcher scrubs on every boot.
  3. execs the built binary.

See deploy/README.md for the launcher .conf, the Compose runtime env, the dstack compose example that puts both behind compose_hash, and the Rust-capable gateway image recipe.

Environment variables

Use the PRIVATE_AI_GATEWAY_* prefix for runtime configuration. The binary also accepts the older DSTACK_LLM_ROUTER_* names as compatibility aliases; the PRIVATE_AI_GATEWAY_* value wins when both are set.

Setting Name
Bind address PRIVATE_AI_GATEWAY_BIND
Upstream config file PRIVATE_AI_GATEWAY_UPSTREAM_CONFIG_PATH
Initial upstream config seed file PRIVATE_AI_GATEWAY_UPSTREAM_CONFIG_SEED_PATH
Admin API bearer token PRIVATE_AI_GATEWAY_ADMIN_TOKEN
Source-provenance repo URL PRIVATE_AI_GATEWAY_REPO_URL
Source-provenance commit PRIVATE_AI_GATEWAY_REPO_COMMIT
Body retention seconds PRIVATE_AI_GATEWAY_BODY_RETENTION_SECONDS
Receipt TTL seconds PRIVATE_AI_GATEWAY_RECEIPT_TTL_SECONDS
TLS certificate paths, comma-separated PRIVATE_AI_GATEWAY_TLS_CERT_PATHS
TLS SPKI SHA-256 digests, comma-separated PRIVATE_AI_GATEWAY_TLS_SPKI_SHA256
Upstream verifier mode: none, preverified, aci-dcap PRIVATE_AI_GATEWAY_UPSTREAM_VERIFIER
Accepted upstream workload IDs, comma-separated PRIVATE_AI_GATEWAY_UPSTREAM_ACCEPTED_WORKLOAD_IDS
Accepted upstream image digests, comma-separated PRIVATE_AI_GATEWAY_UPSTREAM_ACCEPTED_IMAGE_DIGESTS
Accepted upstream dstack KMS root public keys, comma-separated PRIVATE_AI_GATEWAY_UPSTREAM_DSTACK_KMS_ROOT_PUBLIC_KEYS
Upstream verifier PCCS URL PRIVATE_AI_GATEWAY_UPSTREAM_PCCS_URL
Upstream verifier cache seconds PRIVATE_AI_GATEWAY_UPSTREAM_VERIFIER_CACHE_SECONDS
Upstream TCP/TLS connect timeout seconds PRIVATE_AI_GATEWAY_UPSTREAM_CONNECT_TIMEOUT_SECONDS
Upstream read idle timeout seconds PRIVATE_AI_GATEWAY_UPSTREAM_READ_TIMEOUT_SECONDS
Upstream verifier request timeout seconds PRIVATE_AI_GATEWAY_UPSTREAM_VERIFIER_REQUEST_TIMEOUT_SECONDS
dstack SDK endpoint PRIVATE_AI_GATEWAY_DSTACK_ENDPOINT
Optional middleware Unix socket path PRIVATE_AI_GATEWAY_MIDDLEWARE_UDS_PATH
Internal backend Unix socket path in middleware mode PRIVATE_AI_GATEWAY_BACKEND_UDS_PATH

Provider-owned verifier bridges also read PRIVATE_AI_VERIFIER_DIR when they need the local private-ai-verifier checkout. If unset in this monorepo, the gateway uses the sibling ../private-ai-verifier path. Chutes credentials and E2EE tuning are upstream config fields, not deployment env vars: bearer_token, chutes_e2ee_api_base, chutes_chute_ids, chutes_e2ee_discovery_rounds, and chutes_e2ee_discovery_interval_seconds. The Rust adapter passes those values to the verifier bridge internally.

Prefer PRIVATE_AI_GATEWAY_TLS_CERT_PATHS: the gateway reads the mounted leaf certificate, computes sha256(SPKI), and publishes that digest in the attested keyset. PRIVATE_AI_GATEWAY_TLS_SPKI_SHA256 remains for manual or test deployments. Set only one of the two.

PRIVATE_AI_GATEWAY_DSTACK_ENDPOINT accepts an HTTP(S) endpoint or a Unix socket endpoint such as unix:/var/run/dstack.sock. If unset, the dstack SDK uses /var/run/dstack.sock or DSTACK_SIMULATOR_ENDPOINT. For local testing with an SSH-forwarded CVM socket, use unix:/tmp/aci-dstack-sock-dev.dstack.sock. The older PRIVATE_AI_GATEWAY_DSTACK_QUOTER_URL name is still accepted as a compatibility alias.

The default upstream-verification mode is none, while the request path is fail-closed by default. aci-dcap is only for upstreams that expose the ACI attestation report shape on dstack. Configure it with at least one accepted upstream workload ID or image digest, and the accepted dstack KMS root public key. The verifier fetches the upstream's /v1/attestation/report, validates the ACI workload/keyset binding, verifies the embedded Intel DCAP quote through dcap-qvl, replays the dstack event log against the quote's RTMR3, verifies the identity key's dstack KMS signature chain to the configured KMS root, and caches a successful result for 300 seconds unless overridden. Provider adapters are Rust implementations, not configured shell commands. The adapter owns the provider-specific transport path and may outsource attestation verification to provider-owned verifier logic. The call is selected by the Rust adapter, not by upstream config. Tinfoil uses the private-ai-verifier bridge and returns the TLS SPKI bound in the Tinfoil attestation document. NEAR AI requests its report with TLS fingerprint binding enabled and returns that SPKI once the dstack verification dependency is available. Chutes verifies the E2EE public key against TDX report_data, Intel DCAP status, Chutes' public measurement profiles, and NVIDIA NRAS nonce binding using the upstream config bearer_token. Its backend then fetches a live nonce/key batch, selects only an instance whose E2EE public key matches the verified binding, encrypts the OpenAI JSON body with Chutes' ML-KEM-768 + HKDF-SHA256 + ChaCha20-Poly1305 transport, and decrypts buffered or streaming responses before the receipt pipeline hashes them. The OpenAI-compatible backend enforces tls_spki_sha256 and tls_certificate_sha256 bindings against the actual upstream HTTPS handshake. preverified is only for explicit out-of-band trust during bring-up.

Upstream forwarding defaults to a 10 second connect timeout and a 600 second read idle timeout. The read timeout is not a total generation deadline; for streaming responses it bounds how long the gateway waits between upstream chunks. Upstream ACI/DCAP verification uses the same connect timeout and a 60 second total verification timeout by default, covering report fetch, collateral fetch, and quote checks. The timeout env vars set global defaults; per-upstream config can override them with connect_timeout_seconds, read_timeout_seconds, and verifier_request_timeout_seconds. Chutes verification does live evidence, DCAP, and NRAS checks; configure a higher per-upstream verifier timeout if the default is too low for the selected chute.

The gateway prewarms upstream verification at startup. It also proactively refreshes cached verification before expiry; by default the refresh loop runs at the verifier cache TTL minus 60 seconds, so the normal 300 second cache refreshes every 240 seconds. If multiple upstreams configure different positive refresh intervals, the loop uses the shortest active interval. External provider verifier refresh keeps the current good cache entry while the new evidence is fetched, so user requests can continue using the previous verified identity during refresh. Set an upstream's verification_refresh_seconds to 0 to skip that upstream during proactive verifier refresh.

Provider session material refreshes every 45 seconds by default for adapters that have session material today. Set an upstream's session_refresh_seconds to 0 to disable that loop. For Chutes, session refresh is lightweight: it reuses cached verified E2EE key bindings and only refills single-use invocation nonces, so user traffic does not have to wait for nonce discovery when the pool is low or expired.

The gateway has one upstream config file. Set PRIVATE_AI_GATEWAY_UPSTREAM_CONFIG_PATH to its path; if unset, the default is /var/lib/private-ai-gateway/upstreams.json. A missing, empty, or whitespace-only file is valid and means no upstreams are configured yet. The file contains a JSON array:

[
  {
    "name": "gpu-a",
    "provider": "aci-dcap",
    "base_url": "https://gpu-a.example",
    "models": {
      "public-model-a": "upstream-model-a"
    },
    "accepted_workload_ids": ["aci:workload:..."],
    "accepted_dstack_kms_root_public_keys": ["02..."],
    "connect_timeout_seconds": 10,
    "read_timeout_seconds": 600,
    "verifier_request_timeout_seconds": 60,
    "verification_refresh_seconds": 240
  }
]

For one-command compose deployments, set PRIVATE_AI_GATEWAY_UPSTREAM_CONFIG_SEED_PATH to a read-only seed file mounted by compose. If the mutable config path is missing or whitespace-only at startup, the gateway validates the seed and copies it there once. Existing admin-updated config is never overwritten.

provider defaults to openai-compatible. Supported values are openai-compatible, aci-dcap, chutes, tinfoil, and near-ai. The non-default providers select concrete Rust verifier adapters; they do not expose a configurable verifier command.

In no-middleware mode, the public model id is what clients send and what /v1/models returns. The gateway treats that id as the target route id and rewrites it to the upstream model id before upstream verification, forwarding, and receipt hashing. When PRIVATE_AI_GATEWAY_MIDDLEWARE_UDS_PATH is set, /v1/models passes through middleware over HTTP-on-UDS and middleware chooses a configured target route id by calling the internal backend UDS; the frontend still preserves the user model name for downstream E2EE AAD. See docs/frontend-middleware-backend.md and docs/middleware-integration.md. Per-upstream verifier fields override the global PRIVATE_AI_GATEWAY_UPSTREAM_ACCEPTED_* settings when present. Chutes upstreams should put the provider API key in bearer_token. Optional Chutes fields are chutes_e2ee_api_base, chutes_chute_ids, chutes_e2ee_discovery_rounds (default 3), chutes_e2ee_discovery_interval_seconds (default 0), and session_refresh_seconds (default 45). chutes_chute_ids maps configured upstream model ids to concrete Chutes chute_id UUIDs; production Chutes routes should use it instead of catalog name lookup.

When PRIVATE_AI_GATEWAY_ADMIN_TOKEN is set, an admin can inspect and replace the same config file at runtime:

curl -H "Authorization: Bearer $PRIVATE_AI_GATEWAY_ADMIN_TOKEN" \
  http://127.0.0.1:8086/v1/admin/upstreams

curl -X PUT -H "Authorization: Bearer $PRIVATE_AI_GATEWAY_ADMIN_TOKEN" \
  -H "content-type: application/json" \
  --data-binary @upstreams.json \
  http://127.0.0.1:8086/v1/admin/upstreams

The admin view redacts bearer tokens and returns the active config_digest. PUT validates the replacement config, writes it to the single configured file, and swaps the live upstream router/backend state. If no admin token is configured, the admin endpoint returns 404.

Dependencies

The dependency list is intentionally small. Each crate is named below with the reason it is in the tree:

Crate Role
serde, serde_json ACI wire types and JCS input. preserve_order so existing JSON structure is preserved through round-trips.
dstack-sdk dstack KMS key release, /Info, and /GetQuote over the guest-agent socket or HTTP endpoint.
sha2 SHA-256 for canonical digests, report data, receipt signing.
ed25519-dalek Workload identity / receipt Ed25519 signing.
k256 secp256k1 ECDSA recoverable signing per ACI §9.4.
curve25519-dalek, x25519-dalek dstack-vLLM-proxy Ed25519/X25519 E2EE compatibility profile.
aes-gcm, hkdf ACI E2EE field encryption.
base64, chacha20poly1305, flate2, ml-kem Chutes provider E2EE transport compatible with chutes-e2ee.
rand, rand_core Receipt id randomness.
hex Hex encoding for public keys, digests, signatures.
rustls-pemfile, x509-parser, rustls, webpki-roots Parse mounted TLS leaf certificates, publish attested SPKI digests, and enforce upstream SPKI bindings.
axum, tokio, tower HTTP server. Axum 0.7 + tokio multi-thread runtime.
reqwest (rustls-tls) Upstream HTTP client. Rustls avoids a system OpenSSL dependency inside the dstack image.
dcap-qvl Pure-Rust Intel DCAP quote verification for ACI upstream reports.
thiserror Library error types.
tracing, tracing-subscriber Structured logging.
prometheus Gateway-owned /v1/metrics counters. The service does not expose upstream metrics.
async-trait Async trait helpers on UpstreamBackend.

No NVIDIA / nvtrust crates. GPU attestation is a per-upstream concern and will arrive with the verifier traits.

Running

cargo test                                  # all unit + integration tests
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
printf '[{"name":"local","base_url":"http://127.0.0.1:9000","models":{"local-model":"local-model"}}]\n' \
  >/tmp/private-ai-gateway-upstreams.json
PRIVATE_AI_GATEWAY_DSTACK_ENDPOINT=unix:/tmp/aci-dstack-sock-dev.dstack.sock \
PRIVATE_AI_GATEWAY_REPO_URL=https://github.com/Dstack-TEE/private-ai-gateway.git \
PRIVATE_AI_GATEWAY_REPO_COMMIT=0123456789abcdef0123456789abcdef01234567 \
PRIVATE_AI_GATEWAY_UPSTREAM_CONFIG_PATH=/tmp/private-ai-gateway-upstreams.json \
cargo run --release

The dev binary listens on 127.0.0.1:8086 by default.

Phala multi-upstream smoke

Run the local Docker smoke first after changing routing, upstream verification, receipt hashing, dynamic upstream config, or model metrics:

scripts/local_multi_upstream_smoke.sh

It runs two upstream ACI services plus one gateway under local Docker Compose. All three mount the forwarded dstack socket from DSTACK_SOCK (default /tmp/aci-dstack-sock-dev.dstack.sock), so it exercises real dstack KMS keys and quotes while avoiding a full Phala deployment. The gateway starts with an empty config file, receives its upstream routes through PUT /v1/admin/upstreams, then performs the same routing, receipt, and metrics assertions as the Phala smoke.

Run the slower real Phala smoke when you need to validate the dstack deployment surface:

scripts/phala_multi_upstream_smoke.sh

It builds and pushes Dockerfile.smoke to ttl.sh, deploys two mocked upstream ACI services, fetches each upstream attestation report to derive the dstack KMS root policy, then deploys one gateway with a single upstream config file mounted into the CVM. It asserts:

  • /v1/models returns only public model ids
  • each public model id routes to the expected upstream
  • request.forwarded hashes the rewritten upstream request body
  • upstream.verified is recorded as verified with the upstream model id
  • metrics record upstream model ids and never public aliases

Artifacts are written to /tmp/private-ai-gateway-smoke-router by default. Set IMAGE_REF=<existing-image-ref> to skip the Docker build/push, or WORK_DIR=<path> to keep artifacts elsewhere.

Roadmap

The current pending-task list lives in docs/roadmap.md. The next major implementation item is hardening the frontend/middleware/backend framework: production compose wiring for a concrete middleware container. Middleware implementers can use docs/middleware-integration.md.

About

Private AI Gateway for Attested Confidential Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors