You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: BACKENDS.md
+36Lines changed: 36 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,42 @@ opus-transcriber-proxy uses an abstract backend system that allows you to choose
10
10
### OpenAI (Default)
11
11
Uses OpenAI's Realtime API for low-latency streaming transcription.
12
12
13
+
### OpenAI Custom
14
+
Re-uses the OpenAI Realtime API backend but connects to a custom WebSocket URL with per-request credentials. Useful for proxies, self-hosted compatible endpoints, or when different sessions need different API keys.
15
+
16
+
**How it works:**
17
+
- Identical to the `openai` backend in all respects (same protocol, same audio format, same session configuration)
18
+
- The WebSocket URL and API key are supplied per-request rather than from environment variables
19
+
20
+
**Per-request configuration:**
21
+
| Source | Parameter | Description |
22
+
|--------|-----------|-------------|
23
+
| URL query param |`openaiCustomUrl`| WebSocket URL to connect to (e.g. `wss://your-proxy/v1/realtime?intent=transcription`) |
24
+
| HTTP header |`X-Custom-Openai-Api-Key`| API key for authentication |
25
+
26
+
Both values are required; if either is missing the backend connection will fail.
27
+
28
+
**Configuration:**
29
+
```bash
30
+
# Enable the openai_custom provider (required)
31
+
ENABLE_OPENAI_CUSTOM_PROVIDER=true
32
+
33
+
# Require wss:// scheme for the openaiCustomUrl parameter (default: true)
34
+
# Set to false to allow unencrypted ws:// connections (not recommended in production)
35
+
OPENAI_CUSTOM_REQUIRE_WSS=false
36
+
37
+
# Optionally set openai_custom as the default provider
# Also pass the X-Custom-Openai-Api-Key HTTP header on the WebSocket upgrade request
45
+
```
46
+
47
+
The global `OPENAI_MODEL` and `OPENAI_TRANSCRIPTION_PROMPT` environment variables are used as defaults for model and prompt, same as for the `openai` provider.
@@ -398,6 +400,7 @@ Do not leave stale descriptions. If a note says "only X happens" and you change
398
400
- Each participant creates its own `OutgoingConnection` and backend connection to the provider.
399
401
- The `tag` field identifies a participant within a session. Format can be `{id}-{ssrc}` or just `{id}`.
400
402
- Deepgram is the only backend that supports raw Opus/Ogg pass-through (controlled by `DEEPGRAM_ENCODING`, default `opus`). It returns the input encoding unchanged from `getDesiredAudioFormat()` when pass-through is active. The old `wantsRawOpus()` method has been replaced by `getDesiredAudioFormat()`.
403
+
-`openai_custom` is a provider that reuses `OpenAIBackend` but with a per-request WebSocket URL (from the `openaiCustomUrl` URL query parameter) and API key (from the `X-Custom-Openai-Api-Key` HTTP header). It is gated by `ENABLE_OPENAI_CUSTOM_PROVIDER=true` (similar to `ENABLE_DUMMY_PROVIDER`). The URL and key are stored in `TranscriberProxyOptions` (`openaiCustomUrl`, `openaiCustomApiKey`) and passed to `BackendFactory.createBackend` via `OpenAICustomOptions`. `BackendFactory` instantiates `OpenAIBackend(tag, participantInfo, wsUrl, apiKey)` for this provider.
401
404
-`DecodedAudio.audioData` is a `Uint8Array` of raw bytes (PCM for decoded audio, raw frames for pass-through). The old `pcmData: Int16Array` field no longer exists.
402
405
- When adding a new backend, implement `getDesiredAudioFormat(inputFormat): AudioFormat`. Return `{ encoding: 'l16', sampleRate: 24000 }` for PCM or `{ ...inputFormat }` (shallow copy) for raw pass-through. Do not return the `inputFormat` reference directly. This method is called on every `reinitializeDecoder` call (not just once at construction), so it must be a pure function of `inputFormat` for a given backend configuration. If the method has connect-time side effects (like `DeepgramBackend` storing `negotiatedFormat`), it will also be called on any new backend instance before `connect()`, so those side effects will be applied correctly.
403
406
-`AudioFormat.encoding` is a lowercase union type: `'opus' | 'ogg' | 'l16'`. The client-facing `'ogg-opus'` value is normalised to `'ogg'` by `validateAudioFormat()`, and all incoming encodings are lowercased before validation so case-insensitive client values are accepted.
0 commit comments