Prompt injection¶
Before you begin¶
Follow the steps in the Install with Helm topic to run F5 AI Gateway.
Overview¶
The F5 prompt injection processor runs in the AI Gateway processors container. This processor detects and optionally blocks prompt injection attacks.
Processor details |
Supported |
---|---|
No |
|
Yes |
|
Base Memory Requirement |
747 MB |
Input stage |
Yes |
Response stage |
No |
Beginning |
|
Supported language(s) |
English |
Required processor order¶
The prompt-injection processor only supports English language prompts; prompt injection attacks crafted in any other language will not be detected.
For protection against prompt injection attacks, the F5 language-id processor must be configured to run in a stage before the prompt-injection processor and configured with reject: true
and allowed_languages: ["en"]
. This will ensure that only prompts detected as English will be allowed to proceed in the processor pipeline to the prompt-injection processor, before being sent to a configured Service.
Configuration¶
processors:
- name: prompt-injection
type: external
config:
endpoint: https://aigw-processors-f5.ai-gateway.svc.cluster.local
namespace: f5
version: 1
params:
reject: true
threshold: 0.95
Parameters¶
Parameters |
Description |
Type |
Required |
Defaults |
Examples |
---|---|---|---|---|---|
|
Should the processor skip any system messages provided when checking for prompt injection. |
|
No |
|
|
|
Minimum confidence score required to treat the prompt as an injection attack. Lower values will make the processor more strict, but more likely to trigger false-positives. |
float |
No |
|
|
When reject
is set to true
, this processor will reject the request when an injection attack is detected, otherwise it will add to the attacks-detected
tag.
Chunking input and batch processing¶
The prompt injection processor will split inputs and responses into overlapping chunks and perform inference on these chunks in batches. Chunks overlap so context can be maintained between chunks, the intention being if a prompt injection attack is on the boundary between two chunks, an overlapping interim chunk will contain it.
Note
Always perform empirical tests on hardware with real or representative data. Profiling is the best way to see how changing chunk and/or batch sizes impacts performance.
Chunking input¶
Chunk size determines how much data from a single input is fed to the model at once. It’s driven by the underlying model constraint on maximum sequence length and task needs for context. It directly impacts memory usage per inference call and can affect latency if chunks are too large.
The prompt injection processor splits its input into chunks of a variable number of tokens, between 32
and 512
(default: 128
).
The number of tokens is configurable by setting PROMPT_INJECTION_PROCESSOR_CHUNK_SIZE
in the processors.f5.env
section of
the AI Gateway Helm chart.
The prompt injection processor implements a sliding window (overlap) for chunking input. A sliding window refers to the practice of dividing longer text into overlapping chunks so that a model can capture context that spans chunk boundaries. During inference, each chunk is fed separately into the classification model. Because each chunk is passed through the model (a forward pass), the process can increase memory usage as more chunks are generated and processed. Too much overlap can lead to repeated processing of the same tokens, which might not improve prediction efficacy and could even introduce redundancy in the predictions. Decreased overlap reduces redundancy in the processed data but with little or no overlap, the model might miss contextual cues that lie near the chunk boundaries, potentially reducing prediction consistency across segments.
The default chunk overlap size in tokens is half the value of the chunk size setting; to disable
overlapping set the environment variable PROMPT_INJECTION_PROCESSOR_CHUNK_OVERLAP: 0
.
This value must not be set to a value larger than chunk_size - 1
.
Batch processing¶
Batch size determines how many separate inputs (or chunks) are processed simultaneously.
Larger batch sizes can improve performance by taking advantage of parallel processing but can also
saturate the GPU. The default batch size is 16
. There is no upper limit on this but
it should be a value greater than or equal to 1
. It is possible to override this
value by setting environment variables PROMPT_INJECTION_PROCESSOR_BATCH_SIZE: 32
in processors.f5.env
.