Conductor

Conductor is a platform for managing models, model runners, model configurations, and virtualizing combinations into virtual model runners exposed to the network through OpenAI, vLLM, Gemini, and Ollama APIs.

Features

Multi-tenant Architecture: Full tenant isolation with tenant-scoped data access
Model Runner Endpoints: Define and manage first-class endpoint types for OpenAI, vLLM, Gemini, and Ollama model runners
Model Definitions: Catalog your models with metadata like family, parameter size, and quantization
Model Configurations: Create reusable configurations with pinned properties for embeddings and completions
Virtual Model Runners: Combine endpoints and configurations into virtual endpoints with load balancing
Configuration Pinning: Automatically inject model parameters into requests (like OllamaFlow)
Session Affinity: Pin clients to specific backend endpoints based on IP address, API key, or custom headers to minimize context drops and model swapping
Load Balancing: Round-robin, random, or first-available endpoint selection with weighted distribution and optional session affinity
Health Checking: Automatic background health monitoring of endpoints with configurable thresholds
RigMonitor Telemetry: Optionally enrich endpoint health with cached host CPU, memory, disk, network, GPU, and Ollama telemetry from RigMonitor sidecars
Policy-Based Routing: Create first-class load-balancing policies that filter or rank endpoints using health, capacity, and RigMonitor metrics
Explainable Routing: Simulate representative requests, inspect candidate elimination, review policy evidence, and persist routing explanations into request history
Preflight Validation: Validate endpoints, model definitions, model configurations, load-balancing policies, and VMRs before saving them
Effective Configuration Preview: Resolve the endpoint set, request permissions, policy attachment, model pinning, and session-affinity settings that a VMR will actually use
Operational Metrics: Export Prometheus-friendly latency, denial, fallback, session-affinity, saturation, and telemetry-freshness signals
Drain And Quarantine Controls: Keep endpoints visible for health diagnostics while intentionally excluding them from new routing
Rate Limiting: Per-endpoint maximum parallel request limits with automatic capacity management
Request History: Optional per-VMR request/response capture for debugging and auditing with configurable retention, redaction, and metadata-only retention modes
React Dashboard: Full-featured UI for managing all entities including real-time health status

Quick Start

Using Docker Compose

cd docker
docker compose up -d

The server will be available at http://localhost:9000 and the dashboard at http://localhost:9100. The Compose file builds the server and dashboard from the local repository Dockerfiles.

Building from Source

Prerequisites

.NET 10 SDK
Node.js 20+

Build and Run Server

cd src/Conductor.Server
dotnet run

Build and Run Dashboard

cd dashboard
npm install
npm run dev

Testing

Conductor's automated tests use Touchstone so the same shared test cases can run through multiple hosts.

src/Test.Shared/ contains the authoritative test definitions.
src/Test.Xunit/ exposes the shared suite through xUnit.
src/Test.Nunit/ exposes the same suite through NUnit.
src/Test.Automated/ runs the suite through the Touchstone console runner.

Common commands:

# Run framework-hosted tests
dotnet test src/Conductor.sln

# Run the console host
dotnet run --project src/Test.Automated/Test.Automated.csproj

See TESTING.md for the full testing guide.

SDKs

Conductor ships lightweight SDKs for common management-plane workflows:

sdk/javascript/ for Node.js and browser-adjacent tooling
sdk/python/ for Python automation and ops scripts

Both SDKs include helpers for:

validation routes
VMR effective configuration preview
explain-routing simulations
endpoint drain, resume, and quarantine actions
request-history search, summary, detail, and bulk deletion
observability metrics summary and text export

API Overview

Supported Provider Types

Conductor currently supports four model runner provider types in both the backend proxy and the dashboard:

Provider Type	Runner Type in UI	Proxied API Shape	Notes
OpenAI	`OpenAI`	OpenAI REST API	Supports OpenAI-style chat, embeddings, and model listing
vLLM	`vLLM`	OpenAI-compatible REST API	First-class runner type in the UI; uses the OpenAI-compatible API surface
Gemini	`Gemini`	Gemini REST API	Supports Gemini-style `models/{model}:generateContent`, streaming, embeddings, and model listing
Ollama	`Ollama`	Ollama REST API	Supports Ollama-style `/api/generate`, `/api/chat`, and embeddings flows

Authentication

Conductor supports two authentication methods:

Header-based: Include x-tenant-id, x-email, and x-password headers
Bearer Token: Include Authorization: Bearer {token} header

User Permission Model

Users have three permission levels:

Permission	Description
Global Admin (`IsAdmin=true`)	Full cross-tenant access to all resources
Tenant Admin (`IsTenantAdmin=true`)	Can manage users and credentials within their own tenant
Standard User	Can only access model configurations, endpoints, runners, and virtual runners in their tenant

Global Admins can operate on any tenant by specifying TenantId in their requests
Tenant Admins have elevated privileges within their assigned tenant
Standard Users have read/write access to non-administrative resources

Endpoints

Entity	Prefix	API Endpoint
Administrator	`admin_`	`/v1.0/administrators`
Tenant	`ten_`	`/v1.0/tenants`
User	`usr_`	`/v1.0/users`
Credential	`cred_`	`/v1.0/credentials`
Model Runner Endpoint	`mre_`	`/v1.0/modelrunnerendpoints`
Model Definition	`md_`	`/v1.0/modeldefinitions`
Model Configuration	`mc_`	`/v1.0/modelconfigurations`
Load Balancing Policy	`lbp_`	`/v1.0/loadbalancingpolicies`
Virtual Model Runner	`vmr_`	`/v1.0/virtualmodelrunners`
Request History	`req_`	`/v1.0/requesthistory`
Request History Summary	-	`/v1.0/requesthistory/summary`
Observability Metrics	-	`/v1.0/observability/metrics`

RigMonitor And Policy Routing

Model runner endpoints can optionally declare a RigMonitor sidecar configuration. Conductor collects that data during the normal endpoint health-check loop, caches it in memory, and exposes it through endpoint health and telemetry routes. The proxy path never performs live RigMonitor calls while handling client traffic.

Endpoint RigMonitor Configuration

Each ModelRunnerEndpoint can include a RigMonitor object with fields such as:

Enabled
HostnameOverride
Port
UseSsl
TimeoutMs
CollectDuringHealthCheck
RequireReadyz
HealthAffectedByRigMonitor
MaxTelemetryAgeMs
CapabilitiesRefreshIntervalMs
TelemetryProfile
TelemetrySelectors

Useful routes:

GET /v1.0/modelrunnerendpoints/health
GET /v1.0/modelrunnerendpoints/{id}/health
GET /v1.0/modelrunnerendpoints/{id}/rigmonitor

First-Class Load Balancing Policies

Load-balancing policies are tenant-scoped resources attached to a VMR by LoadBalancingPolicyId.

Policy CRUD: /v1.0/loadbalancingpolicies
Metrics catalog: GET /v1.0/loadbalancingpolicies/metrics
VMR attachment: set LoadBalancingPolicyId on /v1.0/virtualmodelrunners

Policies combine:

Filters: hard eligibility checks such as health.isHealthy == true or rig.gpu.available == true
Ranking: weighted numeric comparisons such as lowest CPU, lowest GPU utilization, or fewest in-flight requests
FallbackMode: use the VMR's legacy load-balancing mode or fail closed
TieBreaker: round-robin, random, or first available when scores are equal

Example policy payload:

{
  "Name": "Lowest GPU Utilization",
  "MaxTelemetryAgeMs": 30000,
  "Filters": [
    { "Metric": "health.isHealthy", "Operator": "Equal", "ValueType": "Boolean", "Value": "true" },
    { "Metric": "health.hasCapacity", "Operator": "Equal", "ValueType": "Boolean", "Value": "true" },
    { "Metric": "rig.gpu.available", "Operator": "Equal", "ValueType": "Boolean", "Value": "true" }
  ],
  "Ranking": [
    { "Metric": "rig.gpu.avgUtilizationPercent", "Direction": "Ascending", "Weight": 1.0 }
  ],
  "FallbackMode": "UseLegacyLoadBalancingMode",
  "TieBreaker": "RoundRobin",
  "Active": true
}

Example VMR attachment:

{
  "Name": "GPU Chat VMR",
  "BasePath": "/v1.0/api/gpu-chat/",
  "LoadBalancingMode": "RoundRobin",
  "LoadBalancingPolicyId": "lbp_example",
  "ModelRunnerEndpointIds": ["mre_a", "mre_b"],
  "Active": true
}

Explain, Validate, And Preview

The management plane now exposes first-class safety and explainability routes:

POST /v1.0/modelrunnerendpoints/validate
POST /v1.0/modeldefinitions/validate
POST /v1.0/modelconfigurations/validate
POST /v1.0/loadbalancingpolicies/validate
POST /v1.0/virtualmodelrunners/validate
GET /v1.0/virtualmodelrunners/{id}/effective
POST /v1.0/virtualmodelrunners/{id}/explain-routing

Recommended operator flow:

Validate drafts before saving.
Inspect the effective VMR preview to confirm endpoint coverage, request permissions, policy attachment, and model pinning.
Use explain-routing with a representative request body when you need to understand why a request would route, mutate, reuse a session pin, or be denied.

Request-history detail responses also expose the structured routing decision when history is enabled for the VMR.

Operator Notes

Keep unauthenticated RigMonitor sidecars on trusted networks only.
TelemetryProfile now defaults to Full; narrow it only if you need to reduce health-check telemetry cost.
Prefer FallbackMode = UseLegacyLoadBalancingMode first, then move selected VMRs to FailClosed once telemetry freshness and sidecar reliability are proven.
Stale or missing telemetry can make a telemetry-dependent endpoint ineligible for policy evaluation.

Virtual Model Runner Proxy

Virtual model runners expose an API at their configured base path. For example, a VMR with base path /v1.0/api/my-vmr/ would expose:

OpenAI API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
vLLM API: /v1.0/api/my-vmr/v1/chat/completions, /v1.0/api/my-vmr/v1/embeddings
Gemini API: /v1.0/api/my-vmr/v1beta/models/gemini-2.5-flash:generateContent, /v1.0/api/my-vmr/v1beta/models/text-embedding-004:embedContent
Ollama API: /v1.0/api/my-vmr/api/generate, /v1.0/api/my-vmr/api/chat

Configuration

conductor.json

{
  "Webserver": {
    "Hostname": "localhost",
    "Port": 9000,
    "Ssl": false,
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["http://localhost:9100"],
      "AllowedMethods": ["GET", "POST", "PUT", "DELETE", "OPTIONS"],
      "AllowedHeaders": ["Content-Type", "Authorization", "x-tenant-id", "x-email", "x-password", "x-admin-apikey", "x-admin-email", "x-admin-password"],
      "ExposedHeaders": [],
      "AllowCredentials": false,
      "MaxAgeSeconds": 86400
    }
  },
  "Database": {
    "Type": "Sqlite",
    "Filename": "./conductor.db"
  },
  "Logging": {
    "Servers": [],
    "LogDirectory": "./logs/",
    "LogFilename": "conductor.log",
    "ConsoleLogging": true,
    "MinimumSeverity": 0
  },
  "RequestHistory": {
    "Enabled": true,
    "Directory": "./request-history/",
    "RetentionDays": 7,
    "MetadataRetentionDays": 30,
    "BodyRetentionDays": 7,
    "CleanupIntervalMinutes": 60,
    "CaptureRequestBody": true,
    "CaptureResponseBody": true,
    "RedactedHeaders": ["authorization", "x-password", "x-admin-password", "x-goog-api-key"],
    "RedactedJsonFields": ["authorization", "api_key", "apikey", "password", "token", "bearertoken"],
    "MaxRequestBodyBytes": 65536,
    "MaxResponseBodyBytes": 65536
  }
}

Supported Databases

SQLite (default): "Type": "Sqlite", "Filename": "./conductor.db"
PostgreSQL: "Type": "PostgreSql", "ConnectionString": "Host=..."
SQL Server: "Type": "SqlServer", "ConnectionString": "Server=..."
MySQL: "Type": "MySql", "ConnectionString": "Server=..."

CORS Configuration

Cross-Origin Resource Sharing (CORS) can be enabled to allow browser-based applications to access the Conductor API.

Property	Type	Default	Description
`Enabled`	bool	`false`	Enable or disable CORS support
`AllowedOrigins`	string[]	`[]`	List of allowed origins. Use `["*"]` for all origins
`AllowedMethods`	string[]	`["GET", "POST", "PUT", "DELETE", "OPTIONS"]`	Allowed HTTP methods
`AllowedHeaders`	string[]	`["Content-Type", "Authorization", ...]`	Allowed request headers
`ExposedHeaders`	string[]	`[]`	Headers exposed to the browser
`AllowCredentials`	bool	`false`	Allow credentials (cookies, auth headers). Cannot be used with `AllowedOrigins: ["*"]`
`MaxAgeSeconds`	int	`86400`	Preflight cache duration (0-86400 seconds)

Example: Allow all origins (development)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["*"]
    }
  }
}

Example: Restrict to specific origins (production)

{
  "Webserver": {
    "Cors": {
      "Enabled": true,
      "AllowedOrigins": ["https://app.example.com", "https://admin.example.com"],
      "AllowCredentials": true
    }
  }
}

Request History Configuration

Request history captures request/response data for Virtual Model Runners with RequestHistoryEnabled set to true. This is useful for debugging, auditing, troubleshooting, and latency analysis. Each completed entry records total response time and time to first token/byte (FirstTokenTimeMs). For non-streaming responses, FirstTokenTimeMs is set to the same value as ResponseTimeMs.

Property	Type	Default	Description
`Enabled`	bool	`true`	Enable or disable request history globally
`Directory`	string	`"./request-history/"`	Directory for storing request detail JSON files
`RetentionDays`	int	`30`	Legacy retention knob used as a fallback when the newer retention settings are omitted
`MetadataRetentionDays`	int	`30`	Number of days to retain searchable ledger metadata before cleanup (1-365)
`BodyRetentionDays`	int	`30`	Number of days to retain request and response bodies inside detail files before they are scrubbed (1-365)
`CleanupIntervalMinutes`	int	`60`	Interval between cleanup runs in minutes (1-1440)
`CaptureRequestBody`	bool	`true`	Persist request bodies when request history is enabled for the VMR
`CaptureResponseBody`	bool	`true`	Persist response bodies when request history is enabled for the VMR
`RedactedHeaders`	string[]	built-in sensitive headers	Header names redacted before persistence
`RedactedJsonFields`	string[]	built-in sensitive JSON fields	JSON field names redacted recursively before persistence
`MaxRequestBodyBytes`	int	`65536`	Maximum request body bytes to capture (1-10485760)
`MaxResponseBodyBytes`	int	`65536`	Maximum response body bytes to capture (1-10485760)

Note: Request history must be enabled both globally (in conductor.json) and per-VMR (via the RequestHistoryEnabled property).

Captured request history entries include the VMR, routed model runner endpoint, matched model definition, matched model configuration, policy attachment, requested/effective model names, routing outcome, denial reason, mutation summary, HTTP status, body lengths, transfer type, total response time (ResponseTimeMs), and time to first token/byte (FirstTokenTimeMs).

When BodyRetentionDays is shorter than MetadataRetentionDays, Conductor scrubs request and response bodies from detail files while preserving the searchable routing and latency ledger.

Request History Summary API

The summary endpoint returns aggregated request counts grouped by time buckets, useful for charting request volume and success/failure rates over time.

GET /v1.0/requesthistory/summary?startUtc={ISO8601}&endUtc={ISO8601}&interval={hour|day}&vmrGuid={guid}

Parameter	Type	Required	Description
`startUtc`	string	No	Start of time range (UTC, ISO 8601). Default: 1 hour ago
`endUtc`	string	No	End of time range (UTC, ISO 8601). Default: now
`interval`	string	No	Bucket interval: `minute`, `15minute`, `hour`, `6hour`, or `day`. Default: `hour`
`vmrGuid`	string	No	Filter by Virtual Model Runner GUID
`endpointGuid`	string	No	Filter by routed model runner endpoint GUID
`requestorUserGuid`	string	No	Filter by authenticated user GUID
`credentialGuid`	string	No	Filter by credential GUID
`loadBalancingPolicyGuid`	string	No	Filter by attached load-balancing policy GUID
`modelName`	string	No	Filter by requested or effective model
`mutationSummary`	string	No	Filter by mutation-summary substring
`denialReasonCode`	string	No	Filter by denial reason
`sessionAffinityOutcome`	string	No	Filter by session-affinity outcome
`statusClass`	string	No	Filter by status class such as `2xx`, `4xx`, or `5xx`
`sourceIp`	string	No	Filter by requestor source IP
`httpStatus`	integer	No	Filter by exact HTTP status code

Response:

{
  "Data": [
    {
      "TimestampUtc": "2026-03-20T10:00:00Z",
      "SuccessCount": 42,
      "FailureCount": 3,
      "TotalCount": 45
    }
  ],
  "StartUtc": "2026-03-20T10:00:00Z",
  "EndUtc": "2026-03-20T11:00:00Z",
  "Interval": "hour",
  "TotalSuccess": 42,
  "TotalFailure": 3,
  "StatusClassCounts": {
    "2xx": 42,
    "5xx": 3
  },
  "DenialReasonCounts": {
    "AllEndpointsAtCapacity": 2,
    "PolicyRejected": 1
  },
  "SessionAffinityOutcomeCounts": {
    "Hit": 20,
    "Miss": 25
  },
  "TotalRequests": 45
}

Success is defined as HTTP status 100-399; failure is HTTP status 400-599 or null (incomplete requests).

Configuration Pinning

Model configurations can define pinned properties that are automatically merged into incoming requests:

{
  "Name": "Low Temperature Config",
  "PinnedCompletionsProperties": {
    "temperature": 0.3,
    "top_p": 0.9,
    "max_tokens": 2048
  },
  "PinnedEmbeddingsProperties": {
    "model": "text-embedding-ada-002"
  }
}

When a request comes through a virtual model runner, the pinned properties are merged with the request body, allowing you to enforce specific model parameters.

Health Checking & Rate Limiting

Endpoint Health Configuration

Model Runner Endpoints support comprehensive health checking with the following properties:

Property	Type	Default	Description
`HealthCheckUrl`	string	`/`	URL path appended to endpoint base URL for health checks
`HealthCheckMethod`	enum	`GET`	HTTP method (`GET` or `HEAD`)
`HealthCheckIntervalMs`	int	`5000`	Milliseconds between health checks
`HealthCheckTimeoutMs`	int	`5000`	Timeout for health check requests
`HealthCheckExpectedStatusCode`	int	`200`	Expected HTTP status code for healthy
`UnhealthyThreshold`	int	`2`	Consecutive failures before marking unhealthy
`HealthyThreshold`	int	`2`	Consecutive successes before marking healthy
`HealthCheckUseAuth`	bool	`false`	Include API key (Bearer token) in health check requests
`MaxParallelRequests`	int	`4`	Maximum concurrent requests (0 = unlimited)
`Weight`	int	`1`	Relative weight for load balancing (1-1000)
`ServiceState`	enum	`Normal`	Operator-controlled traffic state: `Normal`, `Draining`, or `Quarantined`

Note for OpenAI and vLLM APIs: When using api.openai.com or another OpenAI-compatible backend that requires authentication for model listing, set HealthCheckUseAuth to true and HealthCheckUrl to /v1/models.

Note for Gemini API: When using generativelanguage.googleapis.com, set HealthCheckUseAuth to true and HealthCheckUrl to /v1beta/models. Gemini uses the x-goog-api-key header rather than bearer token authentication.

Health Check Behavior

Endpoints start in an unhealthy state and transition to healthy after meeting the HealthyThreshold
Background tasks continuously monitor each active endpoint at the configured interval
The proxy automatically excludes unhealthy endpoints from request routing
Draining endpoints continue to be probed and remain available for already-pinned session-affinity traffic, but they do not receive new assignments
Quarantined endpoints continue to be probed for diagnostics, but they are excluded from all routing, including pinned-session reuse
When all endpoints are unhealthy, requests return 502 Bad Gateway
When all endpoints are at capacity, requests return 429 Too Many Requests
When all configured endpoints are quarantined or draining, requests are denied with an explicit service-state-specific error

Rate Limiting

Each endpoint tracks in-flight requests in real-time
The MaxParallelRequests property enforces a per-endpoint concurrency limit
Set to 0 for unlimited concurrent requests
Requests are counted from start until the response completes (including streaming)

Weighted Load Balancing

The Weight property influences endpoint selection in round-robin and random modes
Higher weight = more traffic directed to that endpoint
Example: Endpoint A (weight=3) receives 3x more traffic than Endpoint B (weight=1)

Health Status API

Monitor endpoint health via the REST API:

# Health of all endpoints in tenant
GET /v1.0/modelrunnerendpoints/health

# Put an endpoint into maintenance drain mode
POST /v1.0/modelrunnerendpoints/{id}/drain

# Resume normal traffic
POST /v1.0/modelrunnerendpoints/{id}/resume

# Exclude an endpoint from all routing while keeping health visibility
POST /v1.0/modelrunnerendpoints/{id}/quarantine

# Health of endpoints for a specific VMR
GET /v1.0/virtualmodelrunners/{id}/health

Response includes:

Current health state (healthy/unhealthy)
Operator-managed service state (Normal, Draining, Quarantined)
In-flight request count
Total uptime/downtime
Uptime percentage
Last check timestamp
Last error message (if any)

Docker

The included Docker Compose setup uses local build contexts:

Server: src/Conductor.Server/Dockerfile
Dashboard: dashboard/Dockerfile

Building Docker Images

# Build server
./build-server.sh  # or build-server.bat on Windows

# Build dashboard
./build-dashboard.sh  # or build-dashboard.bat on Windows

License

MIT License - see LICENSE.md for details.

Attributions

Music icons created by Freepik - Flaticon

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
archive		archive
assets		assets
dashboard		dashboard
docker		docker
sdk		sdk
src		src
www		www
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Conductor.postman_collection.json		Conductor.postman_collection.json
LICENSE.md		LICENSE.md
LOAD_BALANCING_POLICIES.md		LOAD_BALANCING_POLICIES.md
README.md		README.md
REST_API.md		REST_API.md
TESTING.md		TESTING.md

Folders and files

Latest commit

History

Repository files navigation

Conductor

Features

Quick Start

Using Docker Compose

Building from Source

Prerequisites

Build and Run Server

Build and Run Dashboard

Testing

SDKs

API Overview

Supported Provider Types

Authentication

User Permission Model

Endpoints

RigMonitor And Policy Routing

Endpoint RigMonitor Configuration

First-Class Load Balancing Policies

Explain, Validate, And Preview

Operator Notes

Virtual Model Runner Proxy

Configuration

conductor.json

Supported Databases

CORS Configuration

Request History Configuration

Request History Summary API

Configuration Pinning

Health Checking & Rate Limiting

Endpoint Health Configuration

Health Check Behavior

Rate Limiting

Weighted Load Balancing

Health Status API

Docker

Building Docker Images

License

Attributions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages