Prompt injection

Before you begin

Follow the steps in the Install with Helm topic to run F5 AI Gateway.

Overview

The F5 prompt injection processor runs in the AI Gateway processors container. This processor detects and optionally blocks prompt injection attacks.

Processor details

Supported

Deterministic

No

GPU acceleration support

Yes

Base Memory Requirement

747 MB

Input stage

Yes

Response stage

No

Recommended position in stage

Beginning

Supported language(s)

English

Configuration

processors:
  - name: prompt-injection
    type: external
    config:
      endpoint: https://aigw-processors-f5.ai-gateway.svc.cluster.local
      namespace: f5
      version: 1
    params:
      reject: true
      threshold: 0.95

Parameters

Parameters

Description

Type

Required

Defaults

Examples

Common parameters

skip_system_messages

Should the processor skip any system messages provided when checking for prompt injection.

bool

No

true

true, false

threshold

Minimum confidence score required to treat the prompt as an injection attack. Lower values will make the processor more strict, but more likely to trigger false-positives.

float
0.0 to 1.0

No

0.95

0.5

When reject is set to true, this processor will reject the request when an injection attack is detected, otherwise it will add to the attacks-detected tag.

Tags

Tag key

Description

Example values

attacks-detected

Added if reject is set to false and prompt injection is detected.

["prompt-injection"]

Chunking input and batch processing

The prompt injection processor will split inputs and responses into overlapping chunks and perform inference on these chunks in batches. Chunks overlap so context can be maintained between chunks, the intention being if a prompt injection attack is on the boundary between two chunks, an overlapping interim chunk will contain it.

Note

Always perform empirical tests on hardware with real or representative data. Profiling is the best way to see how changing chunk and/or batch sizes impacts performance.

Chunking input

Chunk size determines how much data from a single input is fed to the model at once. It’s driven by the underlying model constraint on maximum sequence length and task needs for context. It directly impacts memory usage per inference call and can affect latency if chunks are too large.

The maximum sequence length for the prompt injection processor is 512 tokens, so chunk size should not exceed 512. The lowest possible value is 1, but this would result in the underlying model being unable to reliably perform classification on input. The default chunk size in tokens is 128, this value should not be set lower than 32. It is possible to override this value by setting the environment variable PROMPT_INJECTION_PROCESSOR_CHUNK_SIZE: 256 in processors.f5.env.

The prompt injection processor implements a sliding window for chunking input to maintain context between chunks. A sliding window refers to the practice of dividing longer text into overlapping chunks so that a model can capture context that spans chunk boundaries. During inference, each chunk is fed separately into the classification model. Because each segment undergoes a forward pass, the process can increase memory usage as more segments are generated and processed.

The default chunk overlap size in tokens is 64, to disable overlapping set the environment variable PROMPT_INJECTION_PROCESSOR_CHUNK_OVERLAP: 0. This value should not be set to a value larger than chunk_size - 1. A lower overlap number results in less overlap between chunks and a smaller total chunk count. A higher overlap number results in more overlap between chunks and a larger total chunk count.

Batch processing

Batch size determines how many separate inputs (or chunks) are processed simultaneously. Larger batch sizes can improve performance by taking advantage of parallel processing but can also saturate the GPU. The default batch size is 16. There is no upper limit on this but it should be a value greater than or equal to 1. It is possible to override this value by setting environment variables LANGUAGE_ID_PROCESSOR_BATCH_SIZE: 32 in processors.f5.env.