Prompt injection¶
Before you begin¶
Follow the steps in the Install with Helm topic to run F5 AI Gateway.
Overview¶
The F5 prompt injection processor runs in the AI Gateway processors container. This processor detects and optionally blocks prompt injection attacks.
Processor details |
Supported |
---|---|
No |
|
Yes |
|
Base Memory Requirement |
747 MB |
Input stage |
Yes |
Response stage |
No |
Beginning |
|
Supported language(s) |
English |
Configuration¶
processors:
- name: prompt-injection
type: external
config:
endpoint: https://aigw-processors-f5.ai-gateway.svc.cluster.local
namespace: f5
version: 1
params:
reject: true
threshold: 0.95
Parameters¶
Parameters |
Description |
Type |
Required |
Defaults |
Examples |
---|---|---|---|---|---|
|
Should the processor skip any system messages provided when checking for prompt injection. |
|
No |
|
|
|
Minimum confidence score required to treat the prompt as an injection attack. Lower values will make the processor more strict, but more likely to trigger false-positives. |
float |
No |
|
|
When reject
is set to true
, this processor will reject the request when an injection attack is detected, otherwise it will add to the attacks-detected
tag.
Chunking input and batch processing¶
The prompt injection processor will split inputs and responses into overlapping chunks and perform inference on these chunks in batches. Chunks overlap so context can be maintained between chunks, the intention being if a prompt injection attack is on the boundary between two chunks, an overlapping interim chunk will contain it.
Note
Always perform empirical tests on hardware with real or representative data. Profiling is the best way to see how changing chunk and/or batch sizes impacts performance.
Chunking input¶
Chunk size determines how much data from a single input is fed to the model at once. It’s driven by the underlying model constraint on maximum sequence length and task needs for context. It directly impacts memory usage per inference call and can affect latency if chunks are too large.
The maximum sequence length for the prompt injection processor is 512
tokens,
so chunk size should not exceed 512
. The lowest possible value is 1
, but this
would result in the underlying model being unable to reliably perform classification
on input. The default chunk size in tokens is 128
, this value should not be set
lower than 32
. It is possible to override this value by setting the environment
variable PROMPT_INJECTION_PROCESSOR_CHUNK_SIZE: 256
in processors.f5.env
.
The prompt injection processor implements a sliding window for chunking input to maintain context between chunks. A sliding window refers to the practice of dividing longer text into overlapping chunks so that a model can capture context that spans chunk boundaries. During inference, each chunk is fed separately into the classification model. Because each segment undergoes a forward pass, the process can increase memory usage as more segments are generated and processed.
The default chunk overlap size in tokens is 64
, to disable overlapping set the
environment variable PROMPT_INJECTION_PROCESSOR_CHUNK_OVERLAP: 0
. This value
should not be set to a value larger than chunk_size - 1
. A lower overlap number
results in less overlap between chunks and a smaller total chunk count. A higher
overlap number results in more overlap between chunks and a larger total chunk count.
Batch processing¶
Batch size determines how many separate inputs (or chunks) are processed simultaneously.
Larger batch sizes can improve performance by taking advantage of parallel processing but can also
saturate the GPU. The default batch size is 16
. There is no upper limit on this but
it should be a value greater than or equal to 1
. It is possible to override this
value by setting environment variables LANGUAGE_ID_PROCESSOR_BATCH_SIZE: 32
in processors.f5.env
.