apisix/docs/en/latest/plugins/ai-proxy.md at master · apache/apisix

title

ai-proxy

keywords

Apache APISIX

API Gateway

Plugin

ai-proxy

LLM

description

The ai-proxy Plugin simplifies access to LLM and embedding models providers by converting Plugin configurations into the required request format for OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, Amazon Bedrock, and other OpenAI-compatible APIs.

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Description

The ai-proxy Plugin simplifies access to LLM and embedding models by transforming Plugin configurations into the designated request format. It supports the integration with OpenAI, DeepSeek, Azure, AIMLAPI, Anthropic, OpenRouter, Gemini, Vertex AI, Amazon Bedrock, and other OpenAI-compatible APIs.

In addition, the Plugin also supports logging LLM request information in the access log, such as token usage, model, time to the first response, and more. These log entries are also consumed by logging plugins such as http-logger and kafka-logger. These options do not affect error.log.

Request Format

Name	Type	Required	Description
`messages`	Array	True	An array of message objects.
`messages.role`	String	True	Role of the message (`system`, `user`, `assistant`).
`messages.content`	String	True	Content of the message.

Bedrock Converse Request Format

When provider is set to bedrock, the Plugin expects requests in the Bedrock Converse API format. The request URI must end with /converse and the body must contain a messages array.

Name	Type	Required	Description
`messages`	Array	True	An array of message objects.
`messages.role`	String	True	Role of the message (`user`, `assistant`).
`messages.content`	Array	True	An array of content blocks. Each block contains a `text` field (e.g., `[{"text": "What is 1+1?"}]`).
`system`	Array	False	Optional system prompt blocks (e.g., `[{"text": "You are a helpful assistant."}]`).
`inferenceConfig`	Object	False	Optional inference parameters such as `maxTokens`, `temperature`, `topP`, etc.

Attributes

Name	Type	Required	Default	Valid values	Description
provider	string	True		[openai, deepseek, azure-openai, aimlapi, anthropic, openrouter, gemini, vertex-ai, bedrock, openai-compatible]	LLM service provider. When set to `openai`, the Plugin will proxy the request to `https://api.openai.com/chat/completions`. When set to `deepseek`, the Plugin will proxy the request to `https://api.deepseek.com/chat/completions`. When set to `aimlapi`, the Plugin uses the OpenAI-compatible driver and proxies the request to `https://api.aimlapi.com/v1/chat/completions` by default. When set to `anthropic`, the Plugin will proxy the request to `https://api.anthropic.com/v1/chat/completions` by default. When set to `openrouter`, the Plugin uses the OpenAI-compatible driver and proxies the request to `https://openrouter.ai/api/v1/chat/completions` by default. When set to `gemini`, the Plugin uses the OpenAI-compatible driver and proxies the request to `https://generativelanguage.googleapis.com/v1beta/openai/chat/completions` by default. When set to `vertex-ai`, the Plugin will proxy the request to `https://aiplatform.googleapis.com` by default and requires `provider_conf` or `override`. When set to `bedrock`, the Plugin will proxy the request to the AWS Bedrock Converse API (`https://bedrock-runtime.<region>.amazonaws.com`) and signs the request with AWS SigV4. When set to `openai-compatible`, the Plugin will proxy the request to the custom endpoint configured in `override`.
provider_conf	object	False			Configuration for the specific provider. Required when `provider` is set to `vertex-ai` and `override` is not configured. Required when `provider` is set to `bedrock`.
provider_conf.project_id	string	True			Google Cloud Project ID.
provider_conf.region	string	True (depending on provider)		minLength = 1 (for Bedrock)	When `provider` is `vertex-ai`, this is the Google Cloud Region. When `provider` is `bedrock`, this is the AWS region used to construct the Bedrock endpoint and to sign the request with SigV4 (required, must be non-empty).
auth	object	True			Authentication configurations.
auth.header	object	False			Authentication headers. At least one of `header` or `query` must be configured.
auth.query	object	False			Authentication query parameters. At least one of `header` or `query` must be configured.
auth.gcp	object	False			Configuration for Google Cloud Platform (GCP) authentication.
auth.gcp.service_account_json	string	False			Content of the GCP service account JSON file. This can also be configured by setting the `GCP_SERVICE_ACCOUNT` environment variable.
auth.gcp.max_ttl	integer	False		minimum = 1	Maximum TTL (in seconds) for caching the GCP access token.
auth.gcp.expire_early_secs	integer	False	60	minimum = 0	Seconds to expire the access token before its actual expiration time to avoid edge cases.
auth.aws	object	False			Configuration for AWS authentication. Required when `provider` is `bedrock`.
auth.aws.access_key_id	string	True		minLength = 1	AWS access key ID used for SigV4 signing.
auth.aws.secret_access_key	string	True		minLength = 1	AWS secret access key used for SigV4 signing. Stored encrypted.
auth.aws.session_token	string	False		minLength = 1	Optional AWS session token for temporary credentials (e.g., from STS or assumed roles). Stored encrypted.
options	object	False			Model configurations. In addition to `model`, you can configure additional parameters and they will be forwarded to the upstream LLM service in the request body. For instance, if you are working with OpenAI, you can configure additional parameters such as `temperature`, `top_p`, and `stream`. See your LLM provider's API documentation for more available options.
options.model	string	False			Name of the LLM model, such as `gpt-4` or `gpt-3.5`. Refer to the LLM provider's API documentation for available models. When `provider` is `bedrock` and `override.endpoint` is not configured, `model` is required and may be a foundation model ID (e.g., `anthropic.claude-3-5-sonnet-20240620-v1:0`), a cross-region inference profile ID (e.g., `us.anthropic.claude-3-5-sonnet-20240620-v1:0`), or an application inference profile ARN (e.g., `arn:aws:bedrock:us-east-1:123456789012:application-inference-profile/abc123`).
override	object	False			Override setting.
override.endpoint	string	False			Custom LLM provider endpoint, required when `provider` is `openai-compatible`. When `provider` is `bedrock`, this can be set to a custom Bedrock endpoint. If the override URL includes a path containing reserved characters (e.g., Bedrock inference profile ARNs containing `:` or `/`), those characters MUST be URL-encoded (`:` → `%3A`, `/` → `%2F`) so the model ID is preserved as a single path segment.
override.llm_options	object	False			Provider-aware LLM options. See Provider-aware `max_tokens` mapping.
override.llm_options.max_tokens	integer	False		≥ 1	Maximum number of output tokens. APISIX automatically maps this to the provider-specific field name (e.g. `max_completion_tokens` for OpenAI Chat Completions, `max_output_tokens` for OpenAI Responses API, `max_tokens` for most other providers). Always force-overwrites the client value.
override.request_body	object	False			Per target-protocol request body overrides. Keys are target protocol names (`openai-chat`, `openai-responses`, `openai-embeddings`, `anthropic-messages`, `bedrock-converse`); values are partial request bodies that are deep-merged into the outgoing body (objects merged recursively, arrays and scalars replaced wholesale). See Per-protocol request body override.
override.request_body_force_override	boolean	False	false		When `false` (default), client request body fields take priority and `override.request_body` values only fill in missing fields. When `true`, `override.request_body` values forcefully overwrite client fields. Does not affect `override.llm_options`, which always force-overwrites.
logging	object	False			Logging configurations. Does not affect `error.log`.
logging.summaries	boolean	False	false		If true, logs request LLM model, duration, request, and response tokens.
logging.payloads	boolean	False	false		If true, logs request and response payload.
timeout	integer	False	30000	1 - 600000	Request timeout in milliseconds when requesting the LLM service.
keepalive	boolean	False	true		If true, keeps the connection alive when requesting the LLM service.
keepalive_timeout	integer	False	60000	≥ 1000	Keepalive timeout in milliseconds when connecting to the LLM service.
keepalive_pool	integer	False	30	≥ 1	Keepalive pool size for the LLM service connection.
ssl_verify	boolean	False	true		If true, verifies the LLM service's certificate.

Provider-aware `max_tokens` mapping

LLM providers and API endpoints disagree on the field name used to cap the number of output tokens. Configuring override.llm_options.max_tokens lets you set a single value in APISIX and have it forwarded under the field name expected by each provider/endpoint. llm_options always force-overwrites the client value.

The table below shows, for each provider and target API endpoint, the upstream field name APISIX rewrites max_tokens to. A — means the provider does not expose that endpoint.

Provider	OpenAI Chat Completions	OpenAI Responses API	Anthropic Messages
`openai`	`max_completion_tokens` ¹	`max_output_tokens`	—
`openai-compatible`	`max_tokens`	`max_output_tokens`	—
`azure-openai`	`max_tokens`	—	—
`deepseek`	`max_tokens`	—	—
`aimlapi`	`max_tokens`	—	—
`openrouter`	`max_tokens`	—	—
`gemini`	`max_completion_tokens`	—	—
`vertex-ai`	`max_completion_tokens`	—	—
`anthropic`	`max_tokens`	—	`max_tokens`

¹ When provider is openai and the target is the Chat Completions endpoint, APISIX always rewrites to max_completion_tokens and removes any max_tokens field from the request body — max_tokens has been deprecated in favor of max_completion_tokens by OpenAI.

Per-protocol request body override

override.request_body provides fine-grained, per-protocol control over the outgoing request body. Keys are target protocol names (openai-chat, openai-responses, openai-embeddings, anthropic-messages); values are partial JSON objects that are deep-merged into the outgoing body after protocol conversion.

Merge semantics:

Both sides are plain objects (string-keyed) → recursive merge.
Otherwise (scalar, array, type mismatch) → patch value replaces target value wholesale.

Priority between client request and override is controlled by override.request_body_force_override:

false (default): if the client request body already sets the field, it is preserved; the override value only fills in when the field is missing.
true: the override value forcefully overwrites the client field.

When both llm_options and request_body are configured, llm_options is applied first (always force), then request_body deep-merges on top. This means request_body can override fields set by llm_options.

Examples

The examples below demonstrate how you can configure ai-proxy for different scenarios.

:::note

You can fetch the admin_key from config.yaml and save to an environment variable with the following command:

admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g')

:::

Proxy to OpenAI

The following example demonstrates how you can configure the API key, model, and other parameters in the ai-proxy Plugin and configure the Plugin on a Route to proxy user prompts to OpenAI.

Obtain the OpenAI API key and save it to an environment variable:

export OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options":{
          "model": "gpt-4"
        }
      }
    }
  }'

Create a Route with the ai-proxy Plugin configured as such:

services:
  - name: openai-service
    routes:
      - name: openai-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: openai
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: openai-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: openai-route
spec:
  ingressClassName: apisix
  http:
    - name: openai-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: openai
          auth:
            header:
              Authorization: "Bearer your-api-key"
          options:
            model: gpt-4

Apply the configuration to your cluster:

kubectl apply -f ai-proxy-ic.yaml

Send a POST request to the Route with a system prompt and a sample user question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -H "Host: api.openai.com" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

Proxy to DeepSeek

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to DeepSeek.

Obtain the DeepSeek API key and save it to an environment variable:

export DEEPSEEK_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "deepseek",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
          }
        },
        "options": {
          "model": "deepseek-chat"
        }
      }
    }
  }'

Create a Route with the ai-proxy Plugin configured as such:

services:
  - name: deepseek-service
    routes:
      - name: deepseek-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: deepseek
            auth:
              header:
                Authorization: "Bearer ${DEEPSEEK_API_KEY}"
            options:
              model: deepseek-chat

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: deepseek
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: deepseek-chat
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: deepseek-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: deepseek-route
spec:
  ingressClassName: apisix
  http:
    - name: deepseek-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: deepseek
          auth:
            header:
              Authorization: "Bearer your-api-key"
          options:
            model: deepseek-chat

Apply the configuration to your cluster:

kubectl apply -f deepseek-ic.yaml

Send a POST request to the Route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are an AI assistant that helps people find information."
      },
      {
        "role": "user",
        "content": "Write me a 50-word introduction for Apache APISIX."
      }
    ]
  }'

You should receive a response similar to the following:

{
  ...
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Apache APISIX is a dynamic, real-time, high-performance API gateway and cloud-native platform. It provides rich traffic management features like load balancing, dynamic upstream, canary release, circuit breaking, authentication, observability, and more. Designed for microservices and serverless architectures, APISIX ensures scalability, security, and seamless integration with modern DevOps workflows."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

Proxy to Azure OpenAI

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to other LLM services, such as Azure OpenAI.

Obtain the Azure OpenAI API key and save it to an environment variable:

export AZ_OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "azure-openai",
        "auth": {
          "header": {
            "api-key": "'"$AZ_OPENAI_API_KEY"'"
          }
        },
        "options":{
          "model": "gpt-4"
        },
        "override": {
          "endpoint": "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
        }
      }
    }
  }'

Create a Route with the ai-proxy Plugin configured as such:

services:
  - name: azure-openai-service
    routes:
      - name: azure-openai-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: azure-openai
            auth:
              header:
                api-key: "${AZ_OPENAI_API_KEY}"
            options:
              model: gpt-4
            override:
              endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: azure-openai
        auth:
          header:
            api-key: "your-api-key"
        options:
          model: gpt-4
        override:
          endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: azure-openai-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: azure-openai-route
spec:
  ingressClassName: apisix
  http:
    - name: azure-openai-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: azure-openai
          auth:
            header:
              api-key: "your-api-key"
          options:
            model: gpt-4
          override:
            endpoint: "https://api7-azure-openai.openai.azure.com/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview"

Apply the configuration to your cluster:

kubectl apply -f azure-openai-ic.yaml

Send a POST request to the Route with a sample question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "system",
        "content": "You are an AI assistant that helps people find information."
      },
      {
        "role": "user",
        "content": "Write me a 50-word introduction for Apache APISIX."
      }
    ],
    "max_tokens": 800,
    "temperature": 0.7,
    "frequency_penalty": 0,
    "presence_penalty": 0,
    "top_p": 0.95,
    "stop": null
  }'

You should receive a response similar to the following:

{
  "choices": [
    {
      ...,
      "message": {
        "content": "Apache APISIX is a modern, cloud-native API gateway built to handle high-performance and low-latency use cases. It offers a wide range of features, including load balancing, rate limiting, authentication, and dynamic routing, making it an ideal choice for microservices and cloud-native architectures.",
        "role": "assistant"
      }
    }
  ],
  ...
}

Proxy to Amazon Bedrock

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Amazon Bedrock using the Converse API. The Plugin signs the upstream request using AWS SigV4 with the credentials configured in auth.aws.

Save your AWS credentials to environment variables:

export AWS_ACCESS_KEY_ID=<your-aws-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-aws-secret-access-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-route",
    "uri": "/bedrock/converse",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "bedrock",
        "auth": {
          "aws": {
            "access_key_id": "'"$AWS_ACCESS_KEY_ID"'",
            "secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'"
          }
        },
        "options": {
          "model": "anthropic.claude-3-5-sonnet-20240620-v1:0"
        },
        "provider_conf": {
          "region": "us-east-1"
        }
      }
    }
  }'

Send a POST request to the Route in Bedrock Converse format. Note that the URI must end with /converse:

curl "http://127.0.0.1:9080/bedrock/converse" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": [{"text": "What is 1+1?"}]}
    ],
    "inferenceConfig": {"maxTokens": 256}
  }'

You should receive a Bedrock Converse response similar to the following:

{
  "output": {
    "message": {
      "role": "assistant",
      "content": [
        {"text": "1 + 1 = 2."}
      ]
    }
  },
  "stopReason": "end_turn",
  "usage": {
    "inputTokens": 14,
    "outputTokens": 9,
    "totalTokens": 23
  },
  ...
}

If you need to call an application inference profile by ARN through override.endpoint, the reserved characters in the ARN (: and /) must be URL-encoded as %3A and %2F, for example:

https://bedrock-runtime.us-east-1.amazonaws.com/model/arn%3Aaws%3Abedrock%3Aus-east-1%3A123456789012%3Aapplication-inference-profile%2Fabc123/converse

:::note

If auth.aws.session_token is set, it is used for temporary credentials (e.g., obtained from AWS STS or an assumed role) and will be added to the SigV4-signed request automatically. Both auth.aws.secret_access_key and auth.aws.session_token are stored encrypted.

Streaming responses (Bedrock ConverseStream) are not yet supported by the Plugin.

:::

Proxy to OpenAI Embedding Models

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to embedding models. This example will use the OpenAI embedding model endpoint.

Obtain the OpenAI API key and save it to an environment variable:

export OPENAI_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-route",
    "uri": "/embeddings",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options":{
          "model": "text-embedding-3-small",
          "encoding_format": "float"
        },
        "override": {
          "endpoint": "https://api.openai.com/v1/embeddings"
        }
      }
    }
  }'

Create a Route with the ai-proxy Plugin configured as such:

services:
  - name: openai-embeddings-service
    routes:
      - name: openai-embeddings-route
        uris:
          - /embeddings
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: text-embedding-3-small
              encoding_format: float
            override:
              endpoint: "https://api.openai.com/v1/embeddings"

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: openai
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: text-embedding-3-small
          encoding_format: float
        override:
          endpoint: "https://api.openai.com/v1/embeddings"
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: openai-embeddings-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /embeddings
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: openai-embeddings-route
spec:
  ingressClassName: apisix
  http:
    - name: openai-embeddings-route
      match:
        paths:
          - /embeddings
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: openai
          auth:
            header:
              Authorization: "Bearer your-api-key"
          options:
            model: text-embedding-3-small
            encoding_format: float
          override:
            endpoint: "https://api.openai.com/v1/embeddings"

Apply the configuration to your cluster:

kubectl apply -f openai-embeddings-ic.yaml

Send a POST request to the Route with an input string:

curl "http://127.0.0.1:9080/embeddings" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "input": "hello world"
  }'

You should receive a response similar to the following:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        -0.0067144386,
        -0.039197803,
        0.034177095,
        0.028763203,
        -0.024785956,
        -0.04201061,
        ...
      ],
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 2,
    "total_tokens": 2
  }
}

Proxy to Anthropic

The following example demonstrates how you can configure the ai-proxy Plugin to proxy requests to Anthropic's Claude API for chat completion.

Obtain an Anthropic API key and save it to an environment variable:

export ANTHROPIC_API_KEY=<your-api-key>

Create a Route and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-anthropic-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "anthropic",
        "auth": {
          "header": {
            "x-api-key": "'"$ANTHROPIC_API_KEY"'"
          }
        },
        "options": {
          "model": "claude-sonnet-4-20250514"
        }
      }
    }
  }'

Create a Route with the ai-proxy Plugin configured as such:

services:
  - name: anthropic-service
    routes:
      - name: anthropic-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: anthropic
            auth:
              header:
                x-api-key: "${ANTHROPIC_API_KEY}"
            options:
              model: claude-sonnet-4-20250514

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: anthropic
        auth:
          header:
            x-api-key: "your-api-key"
        options:
          model: claude-sonnet-4-20250514
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: anthropic-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: anthropic-route
spec:
  ingressClassName: apisix
  http:
    - name: anthropic-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: anthropic
          auth:
            header:
              x-api-key: "your-api-key"
          options:
            model: claude-sonnet-4-20250514

Apply the configuration to your cluster:

kubectl apply -f anthropic-ic.yaml

The configuration above specifies anthropic as the provider and attaches the Anthropic API key in the x-api-key header.

Send a POST request to the Route with a system prompt and a sample user question in the request body:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "1+1 equals 2."
    }
  ],
  "model": "claude-sonnet-4-20250514",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 19,
    "output_tokens": 11
  }
}

Convert Anthropic Requests to OpenAI-Compatible Backend

The following example demonstrates how the ai-proxy Plugin can accept requests in the Anthropic Messages API format and automatically convert them to the OpenAI-compatible format before forwarding to any OpenAI-compatible backend (such as OpenAI, DeepSeek, or other compatible services). This is useful when client applications send Anthropic-formatted requests but you want to use a different LLM backend.

The protocol conversion is triggered automatically when the Route URI is set to /v1/messages (the Anthropic Messages API endpoint). The Plugin will convert Anthropic-formatted requests to OpenAI-compatible format and transform the responses back to Anthropic format.

Obtain an API key for your chosen OpenAI-compatible backend service and save it to an environment variable. This example uses OpenAI:

export BACKEND_API_KEY=<your-api-key>

Create a Route with the URI set to /v1/messages to trigger automatic Anthropic protocol conversion, and configure the ai-proxy Plugin as such:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-anthropic-convert-route",
    "uri": "/v1/messages",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$BACKEND_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4"
        }
      }
    }
  }'

Create a Route with the URI set to /v1/messages to trigger automatic Anthropic protocol conversion, and configure the ai-proxy Plugin as such:

services:
  - name: anthropic-convert-service
    routes:
      - name: anthropic-convert-route
        uris:
          - /v1/messages
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${BACKEND_API_KEY}"
            options:
              model: gpt-4

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: openai
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: anthropic-convert-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /v1/messages
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: anthropic-convert-route
spec:
  ingressClassName: apisix
  http:
    - name: anthropic-convert-route
      match:
        paths:
          - /v1/messages
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: openai
          auth:
            header:
              Authorization: "Bearer your-api-key"
          options:
            model: gpt-4

Apply the configuration to your cluster:

kubectl apply -f anthropic-convert-ic.yaml

The backend provider can be any OpenAI-compatible provider, such as openai, deepseek, or others.

Send a POST request to the Route in Anthropic Messages API format:

curl "http://127.0.0.1:9080/v1/messages" -X POST \
  -H "Content-Type: application/json" \
  -H "x-api-key: ${BACKEND_API_KEY}" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "gpt-4",
    "max_tokens": 1024,
    "messages": [
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

Although the request is sent in Anthropic format, it will be automatically converted to OpenAI format and forwarded to the backend. The response is converted back to Anthropic format:

{
  "id": "msg_01XFDUDYJgAACzvnptvVoYEL",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "1+1 equals 2."
    }
  ],
  "model": "gpt-4",
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 12,
    "output_tokens": 8
  }
}

The Plugin supports all features of the Anthropic Messages API, including streaming (SSE), system prompts, and tool use (function calling). The protocol conversion handles the bidirectional mapping between Anthropic and OpenAI formats transparently.

Proxy to Selected Model using Request Body Parameter

The following example demonstrates how you can proxy requests to different models on the same URI, based on the user-specified model in the user requests. You will be using the post_arg.* variable to fetch the value of the request body parameter.

The example will use OpenAI and DeepSeek as the example LLM services. Obtain the OpenAI and DeepSeek API keys and save them to environment variables:

export OPENAI_API_KEY=<your-api-key>
export DEEPSEEK_API_KEY=<your-api-key>

Create a Route to the OpenAI API with the ai-proxy Plugin. The Route URI is /anything and it matches requests where the body parameter model is set to openai:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-openai-route",
    "uri": "/anything",
    "methods": ["POST"],
    "vars": [[ "post_arg.model", "==", "openai" ]],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4"
        }
      }
    }
  }'

Create another Route /anything to the DeepSeek API with the ai-proxy Plugin. This Route matches requests where the body parameter model is set to deepseek:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-deepseek-route",
    "uri": "/anything",
    "methods": ["POST"],
    "vars": [[ "post_arg.model", "==", "deepseek" ]],
    "plugins": {
      "ai-proxy": {
        "provider": "deepseek",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$DEEPSEEK_API_KEY"'"
          }
        },
        "options": {
          "model": "deepseek-chat"
        }
      }
    }
  }'

Create two Routes with the ai-proxy Plugin configured for different providers:

services:
  - name: multi-model-service
    routes:
      - name: openai-route
        uris:
          - /anything
        methods:
          - POST
        vars:
          - - post_arg.model
            - ==
            - openai
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4
      - name: deepseek-route
        uris:
          - /anything
        methods:
          - POST
        vars:
          - - post_arg.model
            - ==
            - deepseek
        plugins:
          ai-proxy:
            provider: deepseek
            auth:
              header:
                Authorization: "Bearer ${DEEPSEEK_API_KEY}"
            options:
              model: deepseek-chat

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

Body parameter matching is not supported in HTTPRoute. The supported matching mechanisms are path, method, headers, and queryParams. This example cannot be completed with Gateway API.

Body parameter matching is currently not supported in ApisixRoute. The supported matching mechanisms are based on Header, Query, or Path. This example cannot be completed with APISIX CRDs.

Send a POST request to the Route with model set to openai:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai",
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

Send a POST request to the Route with model set to deepseek:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek",
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  ...,
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The sum of 1 and 1 is 2. This is a basic arithmetic operation where you combine two units to get a total of two units."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

You can also configure post_arg.* to fetch nested request body parameter. For instance, if the request format is:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "model": {
      "name": "openai"
    },
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You can configure the vars on the Route to be [[ "post_arg.model.name", "==", "openai" ]].

Send Request Log to Logger

The following example demonstrates how you can log request and response information, including LLM model, token, and payload, and push them to a logger. Before proceeding, you should first set up a logger, such as Kafka. See kafka-logger for more information.

Create a Route to your LLM service and configure logging details. Enable summaries to log request LLM model, duration, request and response tokens. Enable payloads to log request and response payload. Update the kafka-logger configuration with your Kafka address, topic, and key:

curl "http://127.0.0.1:9180/apisix/admin/routes" -X PUT \
  -H "X-API-KEY: ${admin_key}" \
  -d '{
    "id": "ai-proxy-openai-route",
    "uri": "/anything",
    "methods": ["POST"],
    "plugins": {
      "ai-proxy": {
        "provider": "openai",
        "auth": {
          "header": {
            "Authorization": "Bearer '"$OPENAI_API_KEY"'"
          }
        },
        "options": {
          "model": "gpt-4"
        },
        "logging": {
          "summaries": true,
          "payloads": true
        }
      },
      "kafka-logger": {
        "brokers": [
          {
            "host": "127.0.0.1",
            "port": 9092
          }
        ],
        "kafka_topic": "test2",
        "key": "key1",
        "batch_max_size": 1
      }
    }
  }'

Create a Route with both ai-proxy and kafka-logger Plugins. Enable summaries to log request LLM model, duration, request and response tokens. Enable payloads to log request and response payload. Update the kafka-logger configuration with your Kafka address, topic, and key:

services:
  - name: logging-service
    routes:
      - name: logging-route
        uris:
          - /anything
        methods:
          - POST
        plugins:
          ai-proxy:
            provider: openai
            auth:
              header:
                Authorization: "Bearer ${OPENAI_API_KEY}"
            options:
              model: gpt-4
            logging:
              summaries: true
              payloads: true
          kafka-logger:
            brokers:
              - host: 127.0.0.1
                port: 9092
            kafka_topic: test2
            key: key1
            batch_max_size: 1

Synchronize the configuration to the gateway:

adc sync -f adc.yaml

apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
  namespace: aic
  name: ai-proxy-logging-plugin-config
spec:
  plugins:
    - name: ai-proxy
      config:
        provider: openai
        auth:
          header:
            Authorization: "Bearer your-api-key"
        options:
          model: gpt-4
        logging:
          summaries: true
          payloads: true
    - name: kafka-logger
      config:
        brokers:
          - host: kafka.aic.svc.cluster.local
            port: 9092
        kafka_topic: test2
        key: key1
        batch_max_size: 1
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  namespace: aic
  name: logging-route
spec:
  parentRefs:
    - name: apisix
  rules:
    - matches:
        - path:
            type: Exact
            value: /anything
          method: POST
      filters:
        - type: ExtensionRef
          extensionRef:
            group: apisix.apache.org
            kind: PluginConfig
            name: ai-proxy-logging-plugin-config

apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
  namespace: aic
  name: logging-route
spec:
  ingressClassName: apisix
  http:
    - name: logging-route
      match:
        paths:
          - /anything
        methods:
          - POST
      plugins:
      - name: ai-proxy
        enable: true
        config:
          provider: openai
          auth:
            header:
              Authorization: "Bearer your-api-key"
          options:
            model: gpt-4
          logging:
            summaries: true
            payloads: true
      - name: kafka-logger
        enable: true
        config:
          brokers:
            - host: kafka.aic.svc.cluster.local
              port: 9092
          kafka_topic: test2
          key: key1
          batch_max_size: 1

Apply the configuration to your cluster:

kubectl apply -f logging-ic.yaml

Send a POST request to the Route:

curl "http://127.0.0.1:9080/anything" -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      { "role": "system", "content": "You are a mathematician" },
      { "role": "user", "content": "What is 1+1?" }
    ]
  }'

You should receive a response similar to the following:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  ...
}

In the Kafka topic, you should also see a log entry corresponding to the request with the LLM summary and request/response payload.

Include LLM Information in Access Log

The following example demonstrates how you can log LLM request related information in the gateway's access log to improve analytics and audit. The following variables are available:

request_llm_model: LLM model name specified in the request.
apisix_upstream_response_time: Time taken for APISIX to send the request to the upstream service and receive the full response.
request_type: Type of request, where the value could be traditional_http, ai_chat, or ai_stream.
llm_time_to_first_token: Duration from request sending to the first token received from the LLM service, in milliseconds.
llm_model: LLM model.
llm_prompt_tokens: Number of tokens in the prompt.
llm_completion_tokens: Number of chat completion tokens in the prompt.

Update the access log format in your configuration file to include additional LLM related variables:

nginx_config:
  http:
    access_log_format: "$remote_addr - $remote_user [$time_local] $http_host \"$request_line\" $status $body_bytes_sent $request_time \"$http_referer\" \"$http_user_agent\" $upstream_addr $upstream_status $apisix_upstream_response_time \"$upstream_scheme://$upstream_host$upstream_uri\" \"$apisix_request_id\" \"$request_type\" \"$llm_time_to_first_token\" \"$llm_model\" \"$request_llm_model\"  \"$llm_prompt_tokens\" \"$llm_completion_tokens\""

Reload APISIX for configuration changes to take effect.

Now if you create a Route and send a request following the Proxy to OpenAI example, you should receive a response similar to the following:

{
  ...,
  "model": "gpt-4-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "1+1 equals 2.",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 23,
    "completion_tokens": 8,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    ...
  },
  "service_tier": "default",
  "system_fingerprint": null
}

In the gateway's access log, you should see a log entry similar to the following:

192.168.215.1 - - [21/Mar/2025:04:28:03 +0000] api.openai.com "POST /anything HTTP/1.1" 200 804 2.858 "-" "curl/8.6.0" - - - 5765 "http://api.openai.com" "5c5e0b95f8d303cb81e4dc456a4b12d9" "ai_chat" "2858" "gpt-4" "gpt-4" "23" "8"

The access log entry shows the request type is ai_chat, Apisix upstream response time is 5765 milliseconds, time to first token is 2858 milliseconds, Requested LLM model is gpt-4. LLM model is gpt-4, prompt token usage is 23, and completion token usage is 8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Description

Request Format

Bedrock Converse Request Format

Attributes

Provider-aware `max_tokens` mapping

Per-protocol request body override

Examples

Proxy to OpenAI

Proxy to DeepSeek

Proxy to Azure OpenAI

Proxy to Amazon Bedrock

Proxy to OpenAI Embedding Models

Proxy to Anthropic

Convert Anthropic Requests to OpenAI-Compatible Backend

Proxy to Selected Model using Request Body Parameter

Send Request Log to Logger

Include LLM Information in Access Log

FilesExpand file tree

ai-proxy.md

Latest commit

History

ai-proxy.md

File metadata and controls

Description

Request Format

Bedrock Converse Request Format

Attributes

Provider-aware max_tokens mapping

Per-protocol request body override

Examples

Proxy to OpenAI

Proxy to DeepSeek

Proxy to Azure OpenAI

Proxy to Amazon Bedrock

Proxy to OpenAI Embedding Models

Proxy to Anthropic

Convert Anthropic Requests to OpenAI-Compatible Backend

Proxy to Selected Model using Request Body Parameter

Send Request Log to Logger

Include LLM Information in Access Log

Provider-aware `max_tokens` mapping