| title | ai-proxy | ||||||
|---|---|---|---|---|---|---|---|
| keywords |
|
||||||
| description | The ai-proxy Plugin simplifies access to LLM and embedding models providers by converting Plugin configurations into the required request format for OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, Amazon Bedrock, and other OpenAI-compatible APIs. |
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
The ai-proxy Plugin simplifies access to LLM and embedding models by transforming Plugin configurations into the designated request format. It supports the integration with OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, Amazon Bedrock, and other OpenAI-compatible APIs.
In addition, the Plugin also supports logging LLM request information in the access log, such as token usage, model, time to the first response, and more. These log entries are also consumed by logging plugins such as http-logger and kafka-logger. These options do not affect error.log.
| Name | Type | Required | Description |
|---|---|---|---|
messages |
Array | True | An array of message objects. |
messages.role |
String | True | Role of the message (system, user, assistant). |
messages.content |
String | True | Content of the message. |
When provider is set to bedrock, the Plugin expects requests in the Bedrock Converse API format. The request URI must end with /converse and the body must contain a messages array.
| Name | Type | Required | Description |
|---|---|---|---|
messages |
Array | True | An array of message objects. |
messages.role |
String | True | Role of the message (user, assistant). |
messages.content |
Array | True | An array of content blocks. Each block contains a text field (e.g., [{"text": "What is 1+1?"}]). |
system |
Array | False | Optional system prompt blocks (e.g., [{"text": "You are a helpful assistant."}]). |
inferenceConfig |
Object | False | Optional inference parameters such as maxTokens, temperature, topP, etc. |
| Name | Type | Required | Default | Valid values | Description |
|---|---|---|---|---|---|
| provider | string | True | [openai, deepseek, azure-openai, aimlapi, anthropic, openrouter, gemini, vertex-ai, bedrock, openai-compatible] | LLM service provider. When set to openai, the Plugin will proxy the request to https://api.openai.com/chat/completions. When set to deepseek, the Plugin will proxy the request to https://api.deepseek.com/chat/completions. When set to aimlapi, the Plugin uses the OpenAI-compatible driver and proxies the request to https://api.aimlapi.com/v1/chat/completions by default. When set to anthropic, the Plugin will proxy the request to https://api.anthropic.com/v1/chat/completions by default. When set to openrouter, the Plugin uses the OpenAI-compatible driver and proxies the request to https://openrouter.ai/api/v1/chat/completions by default. When set to gemini, the Plugin uses the OpenAI-compatible driver and proxies the request to https://generativelanguage.googleapis.com/v1beta/openai/chat/completions by default. When set to vertex-ai, the Plugin will proxy the request to https://aiplatform.googleapis.com by default and requires provider_conf or override. When set to bedrock, the Plugin will proxy the request to the AWS Bedrock Converse API (https://bedrock-runtime.<region>.amazonaws.com) and signs the request with AWS SigV4. When set to openai-compatible, the Plugin will proxy the request to the custom endpoint configured in override. |
|
| provider_conf | object | False | Configuration for the specific provider. Required when provider is set to vertex-ai and override is not configured. Required when provider is set to bedrock. |
||
| provider_conf.project_id | string | True | Google Cloud Project ID. | ||
| provider_conf.region | string | True (depending on provider) | minLength = 1 (for Bedrock) | When provider is vertex-ai, this is the Google Cloud Region. When provider is bedrock, this is the AWS region used to construct the Bedrock endpoint and to sign the request with SigV4 (required, must be non-empty). |
|
| auth | object | True | Authentication configurations. | ||
| auth.header | object | False | Authentication headers. At least one of header or query must be configured. |
||
| auth.query | object | False | Authentication query parameters. At least one of header or query must be configured. |
||
| auth.gcp | object | False | Configuration for Google Cloud Platform (GCP) authentication. | ||
| auth.gcp.service_account_json | string | False | Content of the GCP service account JSON file. This can also be configured by setting the GCP_SERVICE_ACCOUNT environment variable. |
||
| auth.gcp.max_ttl | integer | False | minimum = 1 | Maximum TTL (in seconds) for caching the GCP access token. | |
| auth.gcp.expire_early_secs | integer | False | 60 | minimum = 0 | Seconds to expire the access token before its actual expiration time to avoid edge cases. |
| auth.aws | object | False | Configuration for AWS authentication. Required when provider is bedrock. |
||
| auth.aws.access_key_id | string | True | minLength = 1 | AWS access key ID used for SigV4 signing. | |
| auth.aws.secret_access_key | string | True | minLength = 1 | AWS secret access key used for SigV4 signing. Stored encrypted. | |
| auth.aws.session_token | string | False | minLength = 1 | Optional AWS session token for temporary credentials (e.g., from STS or assumed roles). Stored encrypted. | |
| options | object | False | Model configurations. In addition to model, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI, you can configure additional parameters such as temperature, top_p, and stream. See your LLM provider's API documentation for more available options. |
||
| options.model | string | False | Name of the LLM model, such as gpt-4 or gpt-3.5. Refer to the LLM provider's API documentation for available models. When provider is bedrock and override.endpoint is not configured, model is required and may be a foundation model ID (e.g., anthropic.claude-3-5-sonnet-20240620-v1:0), a cross-region inference profile ID (e.g., us.anthropic.claude-3-5-sonnet-20240620-v1:0), or an application inference profile ARN (e.g., arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/abc123). |
||
| override | object | False | Override setting. | ||
| override.endpoint | string | False | Custom LLM provider endpoint, required when provider is openai-compatible. When provider is bedrock, this can be set to a custom Bedrock endpoint. If the override URL includes a path containing reserved characters (e.g., Bedrock inference profile ARNs containing : or /), those characters MUST be URL-encoded (: → %3A, / → %2F) so the model ID is preserved as a single path segment. |
||
| override.llm_options | object | False | Provider-aware LLM options. See Provider-aware max_tokens mapping. |
||
| override.llm_options.max_tokens | integer | False | ≥ 1 | Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. max_completion_tokens for OpenAI Chat Completions, max_output_tokens for OpenAI Responses API, max_tokens for most other providers). Always force-overwrites the client value. |
|
| override.request_body | object | False | Per target-protocol request body overrides. Keys are target protocol names (openai-chat, openai-responses, openai-embeddings, anthropic-messages, bedrock-converse); values are partial request bodies that are deep-merged into the outgoing body (objects merged recursively, arrays and scalars replaced wholesale). See Per-protocol request body override. |
||
| override.request_body_force_override | boolean | False | false | When false (default), client request body fields take priority and override.request_body values only fill in missing fields. When true, override.request_body values forcefully overwrite client fields. Does not affect override.llm_options, which always force-overwrites. |
|
| logging | object | False | Logging configurations. Does not affect error.log. |
||
| logging.summaries | boolean | False | false | If true, logs request LLM model, duration, request, and response tokens. | |
| logging.payloads | boolean | False | false | If true, logs request and response payload. | |
| timeout | integer | False | 30000 | 1 - 600000 | Request timeout in milliseconds when requesting the LLM service. |
| keepalive | boolean | False | true | If true, keeps the connection alive when requesting the LLM service. | |
| keepalive_timeout | integer | False | 60000 | ≥ 1000 | Keepalive timeout in milliseconds when connecting to the LLM service. |
| keepalive_pool | integer | False | 30 | ≥ 1 | Keepalive pool size for the LLM service connection. |
| ssl_verify | boolean | False | true | If true, verifies the LLM service's certificate. |
LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring override.llm_options.max_tokens lets you set a single value in APISIX and have it forwarded under the field name expected by each provider/endpoint. llm_options always force-overwrites the client value.
The table below shows, for each provider and target API endpoint, the upstream field name APISIX rewrites max_tokens to. A — means the provider does not expose that endpoint.
| Provider | OpenAI Chat Completions | OpenAI Responses API | Anthropic Messages |
|---|---|---|---|
openai |
max_completion_tokens ¹ |
max_output_tokens |
— |
openai-compatible |
max_tokens |
max_output_tokens |
— |
azure-openai |
max_tokens |
— | — |
deepseek |
max_tokens |
— | — |
aimlapi |
max_tokens |
— | — |
openrouter |
max_tokens |
— | — |
gemini |
max_completion_tokens |
— | — |
vertex-ai |
max_completion_tokens |
— | — |
anthropic |
max_tokens |
— | max_tokens |
¹ When provider is openai and the target is the Chat Completions endpoint, APISIX always rewrites to max_completion_tokens and removes any max_tokens field from the request body — max_tokens has been deprecated in favor of max_completion_tokens by OpenAI.
override.request_body provides fine-grained, per-protocol control over the outgoing request body. Keys are target protocol names (openai-chat, openai-responses, openai-embeddings, anthropic-messages); values are partial JSON objects that are deep-merged into the outgoing body after protocol conversion.
Merge semantics:
- Both sides are plain objects (string-keyed) → recursive merge.
- Otherwise (scalar, array, type mismatch) → patch value replaces target value wholesale.
Priority between client request and override is controlled by override.request_body_force_override:
false(default): if the client request body already sets the field, it is preserved; the override value only fills in when the field is missing.true: the override value forcefully overwrites the client field.
When both llm_options and request_body are configured, llm_options is applied first (always force), then request_body deep-merges on top. This means request_body can override fields set by llm_options.
The examples below demonstrate how you can configure ai-proxy for different scenarios.
:::note
You can fetch the admin_key from config.yaml and save to an environment variable with the following command:
admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g'):::
The following example demonstrates how you can configure the API key, model, and other parameters in the ai-proxy Plugin and configure the Plugin on a Route to proxy user prompts to OpenAI.
Obtain the OpenAI API key and save it to an environment variable:
export OPENAI_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
}
}
}
}'Create a Route with the ai-proxy Plugin configured as such:
services:
- name: openai-service
routes:
- name: openai-route
uris:
- /anything
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: gpt-4Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: openai-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: openai-route
spec:
ingressClassName: apisix
http:
- name: openai-route
match:
paths:
- /anything
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4Apply the configuration to your cluster:
kubectl apply -f ai-proxy-ic.yamlSend a POST request to the Route with a system prompt and a sample user question in the request body:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-H "Host: api.openai.com" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive a response similar to the following:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to DeepSeek.
Obtain the DeepSeek API key and save it to an environment variable:
export DEEPSEEK_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'Create a Route with the ai-proxy Plugin configured as such:
services:
- name: deepseek-service
routes:
- name: deepseek-route
uris:
- /anything
methods:
- POST
plugins:
ai-proxy:
provider: deepseek
auth:
header:
Authorization: "Bearer ${DEEPSEEK_API_KEY}"
options:
model: deepseek-chatSynchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: deepseek
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: deepseek-chat
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: deepseek-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: deepseek-route
spec:
ingressClassName: apisix
http:
- name: deepseek-route
match:
paths:
- /anything
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: deepseek
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: deepseek-chatApply the configuration to your cluster:
kubectl apply -f deepseek-ic.yamlSend a POST request to the Route with a sample question in the request body:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
]
}'You should receive a response similar to the following:
{
...
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Apache APISIX is a dynamic, real-time, high-performance API gateway and cloud-native platform. It provides rich traffic management features like load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more. Designed for microservices and serverless architectures, APISIX ensures scalability, security, and seamless integration with modern DevOps workflows."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to other LLM services, such as Azure OpenAI.
Obtain the Azure OpenAI API key and save it to an environment variable:
export AZ_OPENAI_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "azure-openai",
"auth": {
"header": {
"api-key": "'"$AZ_OPENAI_API_KEY"'"
}
},
"options":{
"model": "gpt-4"
},
"override": {
"endpoint": "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
}
}
}
}'Create a Route with the ai-proxy Plugin configured as such:
services:
- name: azure-openai-service
routes:
- name: azure-openai-route
uris:
- /anything
methods:
- POST
plugins:
ai-proxy:
provider: azure-openai
auth:
header:
api-key: "${AZ_OPENAI_API_KEY}"
options:
model: gpt-4
override:
endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: azure-openai
auth:
header:
api-key: "your-api-key"
options:
model: gpt-4
override:
endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: azure-openai-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: azure-openai-route
spec:
ingressClassName: apisix
http:
- name: azure-openai-route
match:
paths:
- /anything
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: azure-openai
auth:
header:
api-key: "your-api-key"
options:
model: gpt-4
override:
endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"Apply the configuration to your cluster:
kubectl apply -f azure-openai-ic.yamlSend a POST request to the Route with a sample question in the request body:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "system",
"content": "You are an AI assistant that helps people find information."
},
{
"role": "user",
"content": "Write me a 50-word introduction for Apache APISIX."
}
],
"max_tokens": 800,
"temperature": 0.7,
"frequency_penalty": 0,
"presence_penalty": 0,
"top_p": 0.95,
"stop": null
}'You should receive a response similar to the following:
{
"choices": [
{
...,
"message": {
"content": "Apache APISIX is a modern, cloud-native API gateway built to handle high-performance and low-latency use cases. It offers a wide range of features, including load balancing, rate limiting, authentication, and dynamic routing, making it an ideal choice for microservices and cloud-native architectures.",
"role": "assistant"
}
}
],
...
}The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Amazon Bedrock using the Converse API. The Plugin signs the upstream request using AWS SigV4 with the credentials configured in auth.aws.
Save your AWS credentials to environment variables:
export AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/bedrock/converse",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "bedrock",
"auth": {
"aws": {
"access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'"
}
},
"options": {
"model": "anthropic.claude-3-5-sonnet-20240620-v1:0"
},
"provider_conf": {
"region": "us-east-1"
}
}
}
}'Send a POST request to the Route in Bedrock Converse format. Note that the URI must end with /converse:
curl "http://127.0.0.1:9080/bedrock/converse" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": [{"text": "What is 1+1?"}]}
],
"inferenceConfig": {"maxTokens": 256}
}'You should receive a Bedrock Converse response similar to the following:
{
"output": {
"message": {
"role": "assistant",
"content": [
{"text": "1 + 1 = 2."}
]
}
},
"stopReason": "end_turn",
"usage": {
"inputTokens": 14,
"outputTokens": 9,
"totalTokens": 23
},
...
}If you need to call an application inference profile by ARN through override.endpoint, the reserved characters in the ARN (: and /) must be URL-encoded as %3A and %2F, for example:
https://bedrock-runtime.us-east-1.amazonaws.com/model/arn%3Aaws%3Abedrock%3Aus-east-1%3A123456789012%3Aapplication-inference-profile%2Fabc123/converse
:::note
If auth.aws.session_token is set, it is used for temporary credentials (e.g., obtained from AWS STS or an assumed role) and will be added to the SigV4-signed request automatically. Both auth.aws.secret_access_key and auth.aws.session_token are stored encrypted.
Streaming responses (Bedrock ConverseStream) are not yet supported by the Plugin.
:::
The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to embedding models. This example will use the OpenAI embedding model endpoint.
Obtain the OpenAI API key and save it to an environment variable:
export OPENAI_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-route",
"uri": "/embeddings",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options":{
"model": "text-embedding-3-small",
"encoding_format": "float"
},
"override": {
"endpoint": "https://api.openai.com/v1/embeddings"
}
}
}
}'Create a Route with the ai-proxy Plugin configured as such:
services:
- name: openai-embeddings-service
routes:
- name: openai-embeddings-route
uris:
- /embeddings
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: text-embedding-3-small
encoding_format: float
override:
endpoint: "https://api.openai.com/v1/embeddings"Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: text-embedding-3-small
encoding_format: float
override:
endpoint: "https://api.openai.com/v1/embeddings"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: openai-embeddings-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /embeddings
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: openai-embeddings-route
spec:
ingressClassName: apisix
http:
- name: openai-embeddings-route
match:
paths:
- /embeddings
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: text-embedding-3-small
encoding_format: float
override:
endpoint: "https://api.openai.com/v1/embeddings"Apply the configuration to your cluster:
kubectl apply -f openai-embeddings-ic.yamlSend a POST request to the Route with an input string:
curl "http://127.0.0.1:9080/embeddings" -X POST \
-H "Content-Type: application/json" \
-d '{
"input": "hello world"
}'You should receive a response similar to the following:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0067144386,
-0.039197803,
0.034177095,
0.028763203,
-0.024785956,
-0.04201061,
...
],
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 2,
"total_tokens": 2
}
}The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Anthropic's Claude API for chat completion.
Obtain an Anthropic API key and save it to an environment variable:
export ANTHROPIC_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-anthropic-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "anthropic",
"auth": {
"header": {
"x-api-key": "'"$ANTHROPIC_API_KEY"'"
}
},
"options": {
"model": "claude-sonnet-4-20250514"
}
}
}
}'Create a Route with the ai-proxy Plugin configured as such:
services:
- name: anthropic-service
routes:
- name: anthropic-route
uris:
- /anything
methods:
- POST
plugins:
ai-proxy:
provider: anthropic
auth:
header:
x-api-key: "${ANTHROPIC_API_KEY}"
options:
model: claude-sonnet-4-20250514Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: anthropic
auth:
header:
x-api-key: "your-api-key"
options:
model: claude-sonnet-4-20250514
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: anthropic-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: anthropic-route
spec:
ingressClassName: apisix
http:
- name: anthropic-route
match:
paths:
- /anything
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: anthropic
auth:
header:
x-api-key: "your-api-key"
options:
model: claude-sonnet-4-20250514Apply the configuration to your cluster:
kubectl apply -f anthropic-ic.yamlThe configuration above specifies anthropic as the provider and attaches the Anthropic API key in the x-api-key header.
Send a POST request to the Route with a system prompt and a sample user question in the request body:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive a response similar to the following:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "1+1 equals 2."
}
],
"model": "claude-sonnet-4-20250514",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 19,
"output_tokens": 11
}
}The following example demonstrates how the ai-proxy Plugin can accept requests in the Anthropic Messages API format and automatically convert them to the OpenAI-compatible format before forwarding to any OpenAI-compatible backend (such as OpenAI, DeepSeek, or other compatible services). This is useful when client applications send Anthropic-formatted requests but you want to use a different LLM backend.
The protocol conversion is triggered automatically when the Route URI is set to /v1/messages (the Anthropic Messages API endpoint). The Plugin will convert Anthropic-formatted requests to OpenAI-compatible format and transform the responses back to Anthropic format.
Obtain an API key for your chosen OpenAI-compatible backend service and save it to an environment variable. This example uses OpenAI:
export BACKEND_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route with the URI set to /v1/messages to trigger automatic Anthropic protocol conversion, and configure the ai-proxy Plugin as such:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-anthropic-convert-route",
"uri": "/v1/messages",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$BACKEND_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'Create a Route with the URI set to /v1/messages to trigger automatic Anthropic protocol conversion, and configure the ai-proxy Plugin as such:
services:
- name: anthropic-convert-service
routes:
- name: anthropic-convert-route
uris:
- /v1/messages
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${BACKEND_API_KEY}"
options:
model: gpt-4Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: anthropic-convert-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /v1/messages
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: anthropic-convert-route
spec:
ingressClassName: apisix
http:
- name: anthropic-convert-route
match:
paths:
- /v1/messages
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4Apply the configuration to your cluster:
kubectl apply -f anthropic-convert-ic.yamlThe backend provider can be any OpenAI-compatible provider, such as openai, deepseek, or others.
Send a POST request to the Route in Anthropic Messages API format:
curl "http://127.0.0.1:9080/v1/messages" -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: ${BACKEND_API_KEY}" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "gpt-4",
"max_tokens": 1024,
"messages": [
{ "role": "user", "content": "What is 1+1?" }
]
}'Although the request is sent in Anthropic format, it will be automatically converted to OpenAI format and forwarded to the backend. The response is converted back to Anthropic format:
{
"id": "msg_01XFDUDYJgAACzvnptvVoYEL",
"type": "message",
"role": "assistant",
"content": [
{
"type": "text",
"text": "1+1 equals 2."
}
],
"model": "gpt-4",
"stop_reason": "end_turn",
"usage": {
"input_tokens": 12,
"output_tokens": 8
}
}The Plugin supports all features of the Anthropic Messages API, including streaming (SSE), system prompts, and tool use (function calling). The protocol conversion handles the bidirectional mapping between Anthropic and OpenAI formats transparently.
The following example demonstrates how you can proxy requests to different models on the same URI, based on the user-specified model in the user requests. You will be using the post_arg.* variable to fetch the value of the request body parameter.
The example will use OpenAI and DeepSeek as the example LLM services. Obtain the OpenAI and DeepSeek API keys and save them to environment variables:
export OPENAI_API_KEY=<your-api-key>
export DEEPSEEK_API_KEY=<your-api-key><Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route to the OpenAI API with the ai-proxy Plugin. The Route URI is /anything and it matches requests where the body parameter model is set to openai:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "openai" ]],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'Create another Route /anything to the DeepSeek API with the ai-proxy Plugin. This Route matches requests where the body parameter model is set to deepseek:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-deepseek-route",
"uri": "/anything",
"methods": ["POST"],
"vars": [[ "post_arg.model", "==", "deepseek" ]],
"plugins": {
"ai-proxy": {
"provider": "deepseek",
"auth": {
"header": {
"Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
}
},
"options": {
"model": "deepseek-chat"
}
}
}
}'Create two Routes with the ai-proxy Plugin configured for different providers:
services:
- name: multi-model-service
routes:
- name: openai-route
uris:
- /anything
methods:
- POST
vars:
- - post_arg.model
- ==
- openai
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: gpt-4
- name: deepseek-route
uris:
- /anything
methods:
- POST
vars:
- - post_arg.model
- ==
- deepseek
plugins:
ai-proxy:
provider: deepseek
auth:
header:
Authorization: "Bearer ${DEEPSEEK_API_KEY}"
options:
model: deepseek-chatSynchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
Body parameter matching is not supported in HTTPRoute. The supported matching mechanisms are path, method, headers, and queryParams. This example cannot be completed with Gateway API.
Body parameter matching is currently not supported in ApisixRoute. The supported matching mechanisms are based on Header, Query, or Path. This example cannot be completed with APISIX CRDs.
Send a POST request to the Route with model set to openai:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "openai",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive a response similar to the following:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}Send a POST request to the Route with model set to deepseek:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek",
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive a response similar to the following:
{
...,
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}You can also configure post_arg.* to fetch nested request body parameter. For instance, if the request format is:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"model": {
"name": "openai"
},
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You can configure the vars on the Route to be [[ "post_arg.model.name", "==", "openai" ]].
The following example demonstrates how you can log request and response information, including LLM model, token, and payload, and push them to a logger. Before proceeding, you should first set up a logger, such as Kafka. See kafka-logger for more information.
<Tabs groupId="api" defaultValue="admin-api" values={[ {label: 'Admin API', value: 'admin-api'}, {label: 'ADC', value: 'adc'}, {label: 'Ingress Controller', value: 'aic'} ]}>
Create a Route to your LLM service and configure logging details. Enable summaries to log request LLM model, duration, request and response tokens. Enable payloads to log request and response payload. Update the kafka-logger configuration with your Kafka address, topic, and key:
curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"id": "ai-proxy-openai-route",
"uri": "/anything",
"methods": ["POST"],
"plugins": {
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
},
"logging": {
"summaries": true,
"payloads": true
}
},
"kafka-logger": {
"brokers": [
{
"host": "127.0.0.1",
"port": 9092
}
],
"kafka_topic": "test2",
"key": "key1",
"batch_max_size": 1
}
}
}'Create a Route with both ai-proxy and kafka-logger Plugins. Enable summaries to log request LLM model, duration, request and response tokens. Enable payloads to log request and response payload. Update the kafka-logger configuration with your Kafka address, topic, and key:
services:
- name: logging-service
routes:
- name: logging-route
uris:
- /anything
methods:
- POST
plugins:
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: gpt-4
logging:
summaries: true
payloads: true
kafka-logger:
brokers:
- host: 127.0.0.1
port: 9092
kafka_topic: test2
key: key1
batch_max_size: 1Synchronize the configuration to the gateway:
adc sync -f adc.yaml<Tabs groupId="k8s-api" defaultValue="gateway-api" values={[ {label: 'Gateway API', value: 'gateway-api'}, {label: 'APISIX CRD', value: 'apisix-crd'} ]}>
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-proxy-logging-plugin-config
spec:
plugins:
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
logging:
summaries: true
payloads: true
- name: kafka-logger
config:
brokers:
- host: kafka.aic.svc.cluster.local
port: 9092
kafka_topic: test2
key: key1
batch_max_size: 1
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: logging-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /anything
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-proxy-logging-plugin-configapiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: logging-route
spec:
ingressClassName: apisix
http:
- name: logging-route
match:
paths:
- /anything
methods:
- POST
plugins:
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
logging:
summaries: true
payloads: true
- name: kafka-logger
enable: true
config:
brokers:
- host: kafka.aic.svc.cluster.local
port: 9092
kafka_topic: test2
key: key1
batch_max_size: 1Apply the configuration to your cluster:
kubectl apply -f logging-ic.yamlSend a POST request to the Route:
curl "http://127.0.0.1:9080/anything" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive a response similar to the following:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}In the Kafka topic, you should also see a log entry corresponding to the request with the LLM summary and request/response payload.
The following example demonstrates how you can log LLM request related information in the gateway's access log to improve analytics and audit. The following variables are available:
request_llm_model: LLM model name specified in the request.apisix_upstream_response_time: Time taken for APISIX to send the request to the upstream service and receive the full response.request_type: Type of request, where the value could betraditional_http,ai_chat, orai_stream.llm_time_to_first_token: Duration from request sending to the first token received from the LLM service, in milliseconds.llm_model: LLM model.llm_prompt_tokens: Number of tokens in the prompt.llm_completion_tokens: Number of chat completion tokens in the prompt.
Update the access log format in your configuration file to include additional LLM related variables:
nginx_config:
http:
access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request_line\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $apisix_upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\" \"$apisix_request_id\" \"$request_type\" \"$llm_time_to_first_token\" \"$llm_model\" \"$request_llm_model\" \"$llm_prompt_tokens\" \"$llm_completion_tokens\""Reload APISIX for configuration changes to take effect.
Now if you create a Route and send a request following the Proxy to OpenAI example, you should receive a response similar to the following:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 23,
"completion_tokens": 8,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
...
},
"service_tier": "default",
"system_fingerprint": null
}In the gateway's access log, you should see a log entry similar to the following:
192.168.215.1 - - [21/Mar/2025:04:28:03 +0000] api.openai.com "POST /anything HTTP/1.1" 200 804 2.858 "-" "curl/8.6.0" - - - 5765 "http://api.openai.com" "5c5e0b95f8d303cb81e4dc456a4b12d9" "ai_chat" "2858" "gpt-4" "gpt-4" "23" "8"
The access log entry shows the request type is ai_chat, Apisix upstream response time is 5765 milliseconds, time to first token is 2858 milliseconds, Requested LLM model is gpt-4. LLM model is gpt-4, prompt token usage is 23, and completion token usage is 8.