Skip to content

Commit 2cabb10

Browse files
authored
Custom OpenAI-compatible provider (#79)
* Add an openai_custom backend. * Add sample dump.
1 parent f7fc7d3 commit 2cabb10

14 files changed

Lines changed: 2518 additions & 24 deletions

.gitignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ dist/
172172
# WebSocket and transcript dumps
173173
media.jsonl
174174
transcript.jsonl
175-
*.jsonl
175+
#*.jsonl
176176

177177
tmp/
178178

BACKENDS.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,42 @@ opus-transcriber-proxy uses an abstract backend system that allows you to choose
1010
### OpenAI (Default)
1111
Uses OpenAI's Realtime API for low-latency streaming transcription.
1212

13+
### OpenAI Custom
14+
Re-uses the OpenAI Realtime API backend but connects to a custom WebSocket URL with per-request credentials. Useful for proxies, self-hosted compatible endpoints, or when different sessions need different API keys.
15+
16+
**How it works:**
17+
- Identical to the `openai` backend in all respects (same protocol, same audio format, same session configuration)
18+
- The WebSocket URL and API key are supplied per-request rather than from environment variables
19+
20+
**Per-request configuration:**
21+
| Source | Parameter | Description |
22+
|--------|-----------|-------------|
23+
| URL query param | `openaiCustomUrl` | WebSocket URL to connect to (e.g. `wss://your-proxy/v1/realtime?intent=transcription`) |
24+
| HTTP header | `X-Custom-Openai-Api-Key` | API key for authentication |
25+
26+
Both values are required; if either is missing the backend connection will fail.
27+
28+
**Configuration:**
29+
```bash
30+
# Enable the openai_custom provider (required)
31+
ENABLE_OPENAI_CUSTOM_PROVIDER=true
32+
33+
# Require wss:// scheme for the openaiCustomUrl parameter (default: true)
34+
# Set to false to allow unencrypted ws:// connections (not recommended in production)
35+
OPENAI_CUSTOM_REQUIRE_WSS=false
36+
37+
# Optionally set openai_custom as the default provider
38+
PROVIDERS_PRIORITY=openai_custom,openai,deepgram,gemini
39+
```
40+
41+
**Usage (per-session via URL):**
42+
```
43+
ws://host/transcribe?sendBack=true&provider=openai_custom&openaiCustomUrl=wss://...
44+
# Also pass the X-Custom-Openai-Api-Key HTTP header on the WebSocket upgrade request
45+
```
46+
47+
The global `OPENAI_MODEL` and `OPENAI_TRANSCRIPTION_PROMPT` environment variables are used as defaults for model and prompt, same as for the `openai` provider.
48+
1349
**Features:**
1450
- WebSocket-based streaming
1551
- Interim and final transcriptions

CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -373,6 +373,8 @@ See README.md for complete list. Key ones:
373373

374374
- `PROVIDERS_PRIORITY` - Provider priority order (default: openai,deepgram,gemini)
375375
- `OPENAI_API_KEY`, `DEEPGRAM_API_KEY`, `GEMINI_API_KEY` - API keys
376+
- `ENABLE_OPENAI_CUSTOM_PROVIDER` - Enable the openai_custom provider (default: false)
377+
- `OPENAI_CUSTOM_REQUIRE_WSS` - Require wss:// for openaiCustomUrl (default: true; set false to allow ws://)
376378
- `PORT`, `HOST` - Server listen config
377379
- `FORCE_COMMIT_TIMEOUT` - Seconds before finalizing pending audio (default: 2)
378380
- `SESSION_RESUME_ENABLED` - Enable session resumption (default: true)
@@ -398,6 +400,7 @@ Do not leave stale descriptions. If a note says "only X happens" and you change
398400
- Each participant creates its own `OutgoingConnection` and backend connection to the provider.
399401
- The `tag` field identifies a participant within a session. Format can be `{id}-{ssrc}` or just `{id}`.
400402
- Deepgram is the only backend that supports raw Opus/Ogg pass-through (controlled by `DEEPGRAM_ENCODING`, default `opus`). It returns the input encoding unchanged from `getDesiredAudioFormat()` when pass-through is active. The old `wantsRawOpus()` method has been replaced by `getDesiredAudioFormat()`.
403+
- `openai_custom` is a provider that reuses `OpenAIBackend` but with a per-request WebSocket URL (from the `openaiCustomUrl` URL query parameter) and API key (from the `X-Custom-Openai-Api-Key` HTTP header). It is gated by `ENABLE_OPENAI_CUSTOM_PROVIDER=true` (similar to `ENABLE_DUMMY_PROVIDER`). The URL and key are stored in `TranscriberProxyOptions` (`openaiCustomUrl`, `openaiCustomApiKey`) and passed to `BackendFactory.createBackend` via `OpenAICustomOptions`. `BackendFactory` instantiates `OpenAIBackend(tag, participantInfo, wsUrl, apiKey)` for this provider.
401404
- `DecodedAudio.audioData` is a `Uint8Array` of raw bytes (PCM for decoded audio, raw frames for pass-through). The old `pcmData: Int16Array` field no longer exists.
402405
- When adding a new backend, implement `getDesiredAudioFormat(inputFormat): AudioFormat`. Return `{ encoding: 'l16', sampleRate: 24000 }` for PCM or `{ ...inputFormat }` (shallow copy) for raw pass-through. Do not return the `inputFormat` reference directly. This method is called on every `reinitializeDecoder` call (not just once at construction), so it must be a pure function of `inputFormat` for a given backend configuration. If the method has connect-time side effects (like `DeepgramBackend` storing `negotiatedFormat`), it will also be called on any new backend instance before `connect()`, so those side effects will be applied correctly.
403406
- `AudioFormat.encoding` is a lowercase union type: `'opus' | 'ogg' | 'l16'`. The client-facing `'ogg-opus'` value is normalised to `'ogg'` by `validateAudioFormat()`, and all incoming encodings are lowercased before validation so case-insensitive client values are accepted.

0 commit comments

Comments
 (0)