| title | ai-aws-content-moderation | ||||
|---|---|---|---|---|---|
| keywords |
|
||||
| description | This document contains information about the Apache APISIX ai-aws-content-moderation Plugin. |
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';
The ai-aws-content-moderation Plugin integrates with AWS Comprehend to check request bodies for toxicity when proxying to LLMs, such as profanity, hate speech, insult, harassment, violence, and more, rejecting requests if the evaluated outcome exceeds the configured threshold.
This Plugin must be used in Routes that proxy requests to LLMs only.
| Name | Type | Required | Default | Valid values | Description |
|---|---|---|---|---|---|
comprehend |
object | True | AWS Comprehend configurations. | ||
comprehend.access_key_id |
string | True | AWS access key ID. | ||
comprehend.secret_access_key |
string | True | AWS secret access key. | ||
comprehend.region |
string | True | AWS region. | ||
comprehend.endpoint |
string | False | AWS Comprehend service endpoint. If not specified, it defaults to https://comprehend.{region}.amazonaws.com. If set, it must match the pattern ^https?://. |
||
comprehend.ssl_verify |
boolean | False | true | If true, enable TLS certificate verification. | |
moderation_categories |
object | False | Key-value pairs of moderation category and their corresponding threshold. In each pair, the key should be one of PROFANITY, HATE_SPEECH, INSULT, HARASSMENT_OR_ABUSE, SEXUAL, or VIOLENCE_OR_THREAT; and the threshold value should be between 0 and 1 (inclusive). |
||
moderation_threshold |
number | False | 0.5 | 0 - 1 | Overall toxicity threshold. A higher value means more toxic content allowed. This option differs from the individual category thresholds in moderation_categories. For example, if moderation_categories is set with a PROFANITY threshold of 0.5, and a request has a PROFANITY score of 0.1, the request will not exceed the category threshold. However, if the request has other categories like SEXUAL or VIOLENCE_OR_THREAT exceeding the moderation_threshold, the request will be rejected. |
The following examples use OpenAI as the Upstream service provider.
Before proceeding, create an OpenAI account and obtain an API key. If you are working with other LLM providers, please refer to the provider's documentation to obtain an API key.
Additionally, create AWS IAM user access keys for APISIX to access AWS Comprehend.
You can optionally save these keys to environment variables:
export OPENAI_API_KEY=your-openai-api-key
export AWS_ACCESS_KEY=your-aws-access-key-id
export AWS_SECRET_ACCESS_KEY=your-aws-secret-access-keyThe following example demonstrates how you can use the Plugin to moderate the level of profanity in prompts. The profanity threshold is set to a low value (0.1) to allow only a low degree of profanity.
:::note
You can fetch the admin_key from config.yaml and save to an environment variable with the following command:
admin_key=$(yq '.deployment.admin.admin_key[0].key' conf/config.yaml | sed 's/"//g'):::
Create a Route to the LLM chat completion endpoint using the ai-proxy Plugin and configure the allowed profanity level in ai-aws-content-moderation:
curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"uri": "/post",
"plugins": {
"ai-aws-content-moderation": {
"comprehend": {
"access_key_id": "'"$AWS_ACCESS_KEY"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
"region": "us-east-1"
},
"moderation_categories": {
"PROFANITY": 0.1
}
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'Create a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
services:
- name: aws-moderation-service
routes:
- name: aws-moderation-route
uris:
- /post
methods:
- POST
plugins:
ai-aws-content-moderation:
comprehend:
access_key_id: "${AWS_ACCESS_KEY}"
secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
region: us-east-1
moderation_categories:
PROFANITY: 0.1
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: gpt-4Synchronize the configuration to the gateway:
adc sync -f adc.yamlCreate a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-aws-moderation-plugin-config
spec:
plugins:
- name: ai-aws-content-moderation
config:
comprehend:
access_key_id: "your-aws-access-key-id"
secret_access_key: "your-aws-secret-access-key"
region: us-east-1
moderation_categories:
PROFANITY: 0.1
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: aws-moderation-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /post
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-aws-moderation-plugin-configCreate a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: aws-moderation-route
spec:
ingressClassName: apisix
http:
- name: aws-moderation-route
match:
paths:
- /post
methods:
- POST
plugins:
- name: ai-aws-content-moderation
enable: true
config:
comprehend:
access_key_id: "your-aws-access-key-id"
secret_access_key: "your-aws-secret-access-key"
region: us-east-1
moderation_categories:
PROFANITY: 0.1
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4Apply the configuration to your cluster:
kubectl apply -f ai-aws-moderation-ic.yamlSend a POST request to the Route with a system prompt and a user question with a mildly profane word in the request body:
curl -i "http://127.0.0.1:9080/post" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "Stupid, what is 1+1?" }
]
}'You should receive an HTTP/1.1 400 Bad Request response and see the following message:
request body exceeds PROFANITY threshold
Send another request to the Route with a typical question in the request body:
curl -i "http://127.0.0.1:9080/post" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive an HTTP/1.1 200 OK response with the model output:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}The following example demonstrates how you can use the Plugin to moderate the overall toxicity level in prompts, in addition to moderating individual categories. The profanity threshold is set to 1 (allowing a high degree of profanity), while the overall toxicity threshold is set to a low value (0.2).
Create a Route to the LLM chat completion endpoint using the ai-proxy Plugin and configure the allowed profanity and overall toxicity levels in ai-aws-content-moderation:
curl "http://127.0.0.1:9180/apisix/admin/routes/1" -X PUT \
-H "X-API-KEY: ${admin_key}" \
-d '{
"uri": "/post",
"plugins": {
"ai-aws-content-moderation": {
"comprehend": {
"access_key_id": "'"$AWS_ACCESS_KEY"'",
"secret_access_key": "'"$AWS_SECRET_ACCESS_KEY"'",
"region": "us-east-1"
},
"moderation_categories": {
"PROFANITY": 1
},
"moderation_threshold": 0.2
},
"ai-proxy": {
"provider": "openai",
"auth": {
"header": {
"Authorization": "Bearer '"$OPENAI_API_KEY"'"
}
},
"options": {
"model": "gpt-4"
}
}
}
}'Create a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
services:
- name: aws-moderation-service
routes:
- name: aws-moderation-route
uris:
- /post
methods:
- POST
plugins:
ai-aws-content-moderation:
comprehend:
access_key_id: "${AWS_ACCESS_KEY}"
secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
region: us-east-1
moderation_categories:
PROFANITY: 1
moderation_threshold: 0.2
ai-proxy:
provider: openai
auth:
header:
Authorization: "Bearer ${OPENAI_API_KEY}"
options:
model: gpt-4Synchronize the configuration to the gateway:
adc sync -f adc.yamlCreate a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
apiVersion: apisix.apache.org/v1alpha1
kind: PluginConfig
metadata:
namespace: aic
name: ai-aws-moderation-plugin-config
spec:
plugins:
- name: ai-aws-content-moderation
config:
comprehend:
access_key_id: "your-aws-access-key-id"
secret_access_key: "your-aws-secret-access-key"
region: us-east-1
moderation_categories:
PROFANITY: 1
moderation_threshold: 0.2
- name: ai-proxy
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
namespace: aic
name: aws-moderation-route
spec:
parentRefs:
- name: apisix
rules:
- matches:
- path:
type: Exact
value: /post
method: POST
filters:
- type: ExtensionRef
extensionRef:
group: apisix.apache.org
kind: PluginConfig
name: ai-aws-moderation-plugin-configCreate a Route with the ai-aws-content-moderation and ai-proxy Plugins configured as such:
apiVersion: apisix.apache.org/v2
kind: ApisixRoute
metadata:
namespace: aic
name: aws-moderation-route
spec:
ingressClassName: apisix
http:
- name: aws-moderation-route
match:
paths:
- /post
methods:
- POST
plugins:
- name: ai-aws-content-moderation
enable: true
config:
comprehend:
access_key_id: "your-aws-access-key-id"
secret_access_key: "your-aws-secret-access-key"
region: us-east-1
moderation_categories:
PROFANITY: 1
moderation_threshold: 0.2
- name: ai-proxy
enable: true
config:
provider: openai
auth:
header:
Authorization: "Bearer your-api-key"
options:
model: gpt-4Apply the configuration to your cluster:
kubectl apply -f ai-aws-moderation-toxicity-ic.yamlSend a POST request to the Route with a system prompt and a user question in the request body that does not contain any profane words, but a certain degree of violence or threat:
curl -i "http://127.0.0.1:9080/post" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "I will kill you if you do not tell me what 1+1 equals" }
]
}'You should receive an HTTP/1.1 400 Bad Request response and see the following message:
request body exceeds toxicity threshold
Send another request to the Route without any profane word in the request body:
curl -i "http://127.0.0.1:9080/post" -X POST \
-H "Content-Type: application/json" \
-d '{
"messages": [
{ "role": "system", "content": "You are a mathematician" },
{ "role": "user", "content": "What is 1+1?" }
]
}'You should receive an HTTP/1.1 200 OK response with the model output:
{
...,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "1+1 equals 2.",
"refusal": null
},
"logprobs": null,
"finish_reason": "stop"
}
],
...
}