F5 AI Gateway API Reference

Overview

F5 AI Gateway provides a Chat Completion API inspired by OpenAI’s Chat API.

A few notable OpenAI features which are not yet supported include the following:

  • streaming

  • tool calling

  • multi-modal input/output ; only Text

Configuration

To serve the V1 Chat Completions API set the schema field on a route to the value v1/chat_completions.

routes:
  - path: /api/v1/chat
    schema: v1/chat_completions
    policy: example

To serve the V1 Models API set the schema field on a route to the value v1/models.

routes:
  - path: /api/v1/chat
    schema: v1/models
    policy: example

See the Configure Routes topic for additional details.

To obtain a list of models from the v1/models API, the profiles section must be configured with models.

A simple example is provided below.

See topics Configure profiles and Selecting a service based on model for complete details.

profiles:
  - name: api-models
    models:
      - name: best
    inputStages:
      - name: protect
        steps:
          - name: prompt-injection
    services:
      - name: ollama/qwen
        selector:
          type: input.model
          values:
            - best

POST v1/chat_completions

Generate a new Chat Completion.

curl -X POST 127.0.0.1:4141/api/v1/chat -H "Content-Type: application/json" -d '
{
  "messages": [
      {
        "role": "user",
        "content":"In the game of Uno is it a legal move to place a Draw 4 card on top of another Draw 4 card making the next player draw 8 cards?"
      }
  ]
}'

Chat Completion request parameters

Field

Type

Description

Required

model

string

ID of the model to use.
*It is considered an error to set this field when the selected profile has not been configured with models.

true*

messages

[]object

A list of message objects comprising the conversation so far.

true

message.role

string

The role of the messages author.
The applicable set of roles are determined by the selected model.

true

message.content

string

The contents of the message.

true

max_completion_tokens

number

An upper bound for the number of tokens that can be generated for a completion.

false

temperature

number

The sampling temperature.
A higher value makes the output more random.
A lower value makes it more deterministic.
The applicable range of values is determined by the model.

false

top_p

number

An alternative to sampling with temperature, called nucleus sampling.
It is generally recommend altering this or temperature but not both.

false

stop

[]string

Up to 4 sequences instructing the model to stop generating further tokens.

false

200 OK

A Chat Completion object.

Chat Completion object parameters

Field

Type

Description

id

string

A unique identifier for the chat completion.

object

string

The type of objects, almost always “chat_completion”.

model

string

The model used to genertate this chat completion.

choices

[]object

A list of chat completion choice objects.

choice.index

number

The index of the choice in the list of choices.

choice.finish_reason

string

The reason the model stopped generating tokens.

choice.message

object

The chat completion message object generated by the model.

choice.message.role

The role of the author of the choice message.

choice.message.content

The contents of the choice message.

usage

object

Basic Usage statistics for the completion reques.

usage.prompt_tokens

number

Number of tokens in the prompt.

usage.completion_tokens

number

Number of tokens in the generated completion.

usage.total_tokens

number

Total number of tokens used in the request (prompt + completion).

400 Bad request

A Chat Completion error indicating that the request was malformed or invalid.

malformed

{
  "error": {
    "type": "decoding_error",
    "message": "request body must be valid JSON"
  }
}

invalid

{
  "error": {
    "type": "validation_error",
    "message": "request must include at least 1 message"
  }
}

Bad Request parameters

Field

Type

Description

Required

type

string

The category of error which was raised.

false

message

string

Details about the error.

false

code

string

Application-specific error code for this error.

false

param

object

Metadata that may be useful for debugging.

false

422 Unprocessable content

A Chat Completion error indicating that a request, while valid, has been refused.

{
  "statusCode": 422,
  "type": "message_not_allowed",
  "message": "rejection_reason: Possible Prompt Injection detected"
}

Unprocessable Content parameters

Field

Type

Description

statusCode

number

The http status code associated with this error

type

string

The category of refusal

message

string

Details about why the message was refused.

GET V1 models

The model configuration for a profile.

curl -X GET  -H "Accept: application/json" http://127.0.0.1:4141/api/v1/models

200 OK

{
  "object": "models",
  "data": [
    {
      "id": "best",
      "object": "model"
    }
  ]
}

Models object parameters

Field

Type

Description

object

string

The object type, always “models”.

data

[]object

List of model objects.

data.id

string

The model identifier, which can be referenced in requests.

data.object

string

The object type, should always “model”.