Data security

Warning

You are using an EXPERIMENTAL processor! Experimental processors:

  • May have bugs or stability issues

  • May experience breaking API changes

  • May not produce the expected results

By using this experimental processor you acknowledge:

  • It should NOT be used in a production context

  • It is NOT covered under F5 support agreements

  • Some experiments are not successful - the functionality could be retired.

Before you begin

Follow the steps in the Install with Helm topic to run F5 AI Gateway.

Overview

The F5 data-security processor runs in the AI Gateway processors container. This processor detects and optionally redacts or blocks arbitrary sensitive data.

Processor details

Supported

Deterministic

Yes

GPU acceleration support

No

Base Memory Requirement

100 MB

Input stage

Yes

Response stage

Yes

Recommended position in stage

Beginning

Supported language(s)

English

Configuration

processors:
  - name: data-security
    type: external
    config:
      endpoint: https://aigw-labs-data-security.ai-gateway.svc.cluster.local #TODO: replace with actual when helm is ready
      namespace: f5-processor-labs
      version: 1
    params:
      experimental: true
      modify: true
      matchers:
        - ssn
        - us_address
        - regex:
            name: image_filename
            value: "^\\w+\\.(gif|png|jpg|jpeg)$"
        - regex:
            name: date
            value: "\\d{4}-\\d{2}-\\d{2}"

Parameters

Parameters

Description

Type

Required

Defaults

Examples

Common parameters

experimental

This flag acts as an acknowledgement that you are using an experimental processor. The processor will not run unless this is set to true.

boolean

Yes

false

true

matchers

A list of data security matchers to run. A complete list can be found here. If the list is empty, all matchers will be used.

list

No

[]

[ssn]

When reject is set to true, this processor will reject the request when sensitive data is detected. When modify is set to true, this processor will replace the sensitive data with X’s. Regardless of mode, it will always add the matches to the sensitive-data tag.

Tags

Tag key

Description

Example values

sensitive-data

Added if sensitive data is detected. Contains the names of the matchers that found matches.

[ssn, sql]

Matcher Structure

Matchers are defined like so:

- ssn
- credit_card

If a matcher has additional configuration options, then it will require a custom name to be specified:

- raw:
    name: hello_world_matcher
    value: "Hello World!"

Matcher Types

STANDARD MATCHERS

These are the base customizable matchers from the data-security engine

raw

A case sensitive string matcher

- raw:
    name: match_test_string
    value: test string

raw_insensitive

A case insensitive string matcher

- raw_insensitive:
    name: match_test_string
    value: Test String

regex

A regular expression (regex) matcher

- regex:
    name: date
    value: "\\d{4}-\\d{2}-\\d{2}"

INTERNAL MATCHERS

These are dedicated matchers that offer better performance and more complex checks than standard matchers.

routing_number

Matches on bank routing numbers.

credit_card

Matches on credit/debit card numbers. Supports almost every major bank, and requires the number to have a valid LUHN checksum.

int_phone

Matches on international phone numbers via Google’s libphonenumber library. Requires the country code to be specified beforehand (that is, +1 or +33). Does not support IDD codes, does not support full RFC3966 syntax (like extensions).

national_phone

Matches on country specific phone numbers and performs extra verification. Takes a name, a regex for matching on a countries number format, and a country code for what additional country specific checks to perform.

- national_phone:
    name: us_number
    regex: "\\d{3}-\\d{3}-\\d{4}"
    country: US

ssn

Matches on US Social Security Numbers.

iban

Matches on International Bank Account Numbers.

sql

Matches on SQL statements. To avoid false positives, extremely simple or benign statements are not considered a match.

vin

Matches on Vehicle Identification Numbers.

eui48

Matches on 48 bit MAC Addresses.

ipv4

Matches on IPv4 addresses.

- ipv4:
    name: ipv4
    value: []

Takes an optional list of sub-types that it can match against. If none are specified then matches against any address. Possible values are:

  • broadcast

  • documentation

  • link_local

  • loopback

  • multicast

  • private

  • unspecified

- ipv4:
    name: ipv4_broadcast
    value:
      - broadcast
      - private

ipv6

Matches on IPv6 addresses.

- ipv6:
    name: ipv6
    value: []

Takes an optional list of sub-types that it can match against. If none are specified then matches against any address. Possible values are:

  • loopback

  • multicast

  • unspecified

- ipv6:
    name: ipv6_loopback
    value:
      - loopback

imei

Attempts to match against known International Mobile Equipment Identity values. It isn’t perfect, but should catch most devices.

name

Matches on common US names. Uses a list of the top 1000 most common first and last names.

us_address

Matches on US addresses. The more specific the address is, the more checks the parser is able to perform (that is, ensuring a ZIP code is within a state or a city is within a ZIP code).

PRE-CANNED REGEX MATCHERS

These are pre-written regular expressions to match on common data patterns.

ls_regex:Email

regex: (?-u:\b)[a-zA-Z0-9][a-zA-Z0-9_.+-]{0,}@[a-zA-Z0-9][a-zA-Z0-9-.]{0,}\.[a-zA-Z]{2,}(?-u:\b)

HASH MATCHERS

These are pre-written regexes to match on common hash types.

Expand the hash matcher list

ls_hash:Bcrypt

format: $2{X}${rounds}${salt}{checksum}

regex: \$2[axyb]?\$\d{2}\$[./A-Za-z0-9]{53}

example: $2b$12$GhvMmNVjRW29ulnudl.LbuAnUtN/LRfe1JsBm1Xu6LE3059z5Tr8m

ls_hash:Sha256Crypt

format: $5$rounds={rounds}${salt}${checksum}

regex: \$5\$(rounds=\d+\$)?[./0-9A-Za-z]{0,16}\$[./0-9A-Za-z]{43}

example: $5$rounds=80000$wnsT7Yr92oJoP28r$cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5

ls_hash:Sha512Crypt

format: $6$rounds={rounds}${salt}${checksum}

regex: \$6\$(rounds=\d+\$)?[./0-9A-Za-z]{0,16}\$[./0-9A-Za-z]{86}

example: $6$rounds=80000$wnsT7Yr92oJoP28r$cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5

ls_hash:Md5Crypt

format: $1${salt}${checksum}

regex: \$1\$[./A-Za-z0-9]{0,8}\$[./A-Za-z0-9]{22}

example: $1$5pZSV9va$azfrPr6af3Fc7dLblQXVa0

ls_hash:Sha1Crypt

format: $sha1${rounds}${salt}${checksum}

regex: \$sha1\$\d+\$[./0-9A-Za-z]{0,64}\$[./0-9A-Za-z]{28}

example: sha1$40000$jtNX3nZ2$hBNaIXkt4wBI2o5rsi8KejSjNqIq

ls_hash:SunMd5Crypt

format: $md5,rounds={rounds}${salt}$${checksum} OR $md5${salt}$${checksum}

regex: \$md5(,rounds=\d+)?\$[./A-Za-z0-9]{0,8}\$\$[./A-Za-z0-9]{22}

example: $md5,rounds=5000$GUBv0xjJ$$mSwgIswdjlTY0YxV7HBVm0 example: $md5$GUBv0xjJ$$mSwgIswdjlTY0YxV7HBVm0

ls_hash:Argon2

format: $argon2{X}$v={version}$m={memory},t={time},p={parallelism}${salt}${digest}

regex: \$argon2[id]{1,2}\$v=\d+\$m=\d+,t=\d+,p=\d+\$[+/=A-Za-z0-9]+\$[+/=A-Za-z0-9]+

example: $bcrypt-sha256$v=2,t=2b,r=12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2 example: $bcrypt-sha256$2b,12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2

ls_hash:BcryptSha256

format: $bcrypt-sha256$v={version},t={type},r={rounds}${salt}${digest} OR $bcrypt-sha256${type},{rounds}${salt}${digest}

regex: \$bcrypt-sha256\$(v=2,t=2b,r=\d+|2[ab],\d+)\$[./A-Za-z0-9]{22}\$[./A-Za-z0-9]{31}

example: $bcrypt-sha256$v=2,t=2b,r=12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2 example: $bcrypt-sha256$2b,12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2

ls_hash:Phpass

format: $P${rounds}{salt}{checksum} OR $H${rounds}{salt}{checksum}

regex: \$[PH]\$[./A-Za-z0-9]{31}

example: $P$8ohUJ.1sdFw09/bMaAQPTGDNi2BIUt1

ls_hash:Pbkdf2Sha1

format: $pbkdf2${rounds}${salt}${checksum}

regex: \$pbkdf2\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{27}

example: $pbkdf2$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrU

ls_hash:Pbkdf2Sha256

format: $pbkdf2-sha256${rounds}${salt}${checksum}

regex: \$pbkdf2-sha256\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{43}

example: $pbkdf2-sha256$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD44

ls_hash:Pbkdf2Sha512

format: $pbkdf2-sha512${rounds}${salt}${checksum}

regex: \$pbkdf2-sha512\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{86}

example: $pbkdf2-sha512$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD4498jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD44

ls_hash:Scram

format: $scram${rounds}${salt}${alg1}={digest1},{alg2}={digest2},...,

regex: \$scram\$\d+\$[./+A-Za-z0-9]+\$((md2|md5|sha-1|sha-224|sha-256|sha-384|sha-512|shake128|shake256)=[./+A-Za-z0-9]+,?)+

example: $scram$6400$.Z/znnNOKWUsBaCU$sha-1=cRseQyJpnuPGn3e6d6u6JdJWk.0,sha-256=5GcjEbRaUIIci1r6NAMdI9OPZbxl9S5CFR6la9CHXYc,sha-512=.DHbIm82ajXbFR196Y.9TtbsgzvGjbMeuWCtKve8TPjRMNoZK9EGyHQ6y0lW9OtWdHZrDZbBUhB9ou./VI2mlw

ls_hash:Scrypt

format: $scrypt$ln={logN},r={r},p={p}${salt}${checksum}

regex: \$scrypt\$ln=\d+,r=\d+,p=\d+\$[./+=A-Za-z0-9]+\$[./+=A-Za-z0-9]{43}

example: $scrypt$ln=16,r=8,p=1$aM15713r3Xsvxbi31lqr1Q$nFNh2CVHVjNldFVKDHDlm4CbdRSCdEBsjjJxD+iCs5E

ls_hash:AprMd5Crypt

format: $apr1${salt}${checksum}

regex: \$apr1\$[./A-Za-z0-9]{0,8}\$[./A-Za-z0-9]{22}

example: $apr1$5pZSV9va$azfrPr6af3Fc7dLblQXVa0

ls_hash:DlitzPbkdf2Sha1

format: $p5k2${rounds}${salt}${checksum}

regex: \$p5k2\$\d+\$[./A-Za-z0-9]+\$[./+A-Za-z0-9]{32}

example: $p5k2$2710$.pPqsEwHD7MiECU0$b8TQ5AMQemtlaSgegw5Je.JBE3QQhLbO

ls_hash:CtaPbkdf2Sha1

format: $p5k2${rounds}${salt}${checksum}

regex: \$p5k2\$\d+\$[./\-=_+A-Za-z0-9]+\$[./\-=_+A-Za-z0-9]{28}

example: $p5k2$2710$oX9ZZOcNgYoAsYL-8bqxKg==$AU2JLf2rNxWoZxWxRCluY0u6h6c=

ls_hash:Mssql2000

format: 0x0100{salt}{digest1}{digest2}

regex: 0x0100[A-F0-9]{88}

example: 0x0100200420C4988140FD3920894C3EDC188E94F428D57DAD5905F6CC1CBAF950CAD4C63F272B2C91E4DEEB5E6444

ls_hash:Mssql2005

format: 0x0100{salt}{digest1}

regex: 0x0100[A-F0-9]{48}

example: 0x01006ACDF9FF5D2E211B392EEF1175EFFE13B3A368CE2F94038B

ls_hash:Mysql41

format: *{checksum}

regex: \*[A-F0-9]{40}

example: *2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19

ls_hash:PostgresMd5

format: md5{checksum}

regex: md5[a-fA-F0-9]{32}

example: md5a5bfc9e07964f8dddeb95fc584cd9655

ls_hash:Oracle11

format: S:{checksum}{salt}

regex: S:[a-fA-F0-9]{60}

example: S:4143053633E59B4992A8EA17D2FF542C9EDEB335C886EED9C80450C1B4E6

ls_hash:BsdNthash

format: $3$${checksum}

regex: \$3\$\$[a-fA-F0-9]{32}

example: $3$$8846f7eaee8fb117ad06bdd830b7586c

ls_hash:DjangoPbkdf2Sha1

format: pbkdf2${rounds}${salt}${checksum}

regex: pbkdf2\$\d+\$[A-Za-z0-9]+\$[+/=A-Za-z0-9]+

example: pbkdf2$6400$6UISnXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrU

ls_hash:DjangoPbkdf2Sha256

format: pbkdf2_sha256${rounds}${salt}${checksum}

regex: pbkdf2_sha256\$\d+\$[A-Za-z0-9]+\$[+/=A-Za-z0-9]+

example: pbkdf2_sha256$10000$s1w0UXDd00XB$+4ORmyvVWAQvoAEWlDgN34vlaJx1ZTZpa1pCSRey2Yk=

ls_hash:DjangoSaltedSha1

format: sha1${salt}${checksum}

regex: sha1\$[a-f0-9]+\$[a-f0-9]+

example: sha1$f8793$c4cd18eb02375a037885706d414d68d521ca18c7

ls_hash:DjangoSaltedMd5

format: md5${salt}${checksum}

regex: md5\$[a-f0-9]+\$[a-f0-9]+

example: md5$f8793$c4cd18eb02375a037885706d414d68d521ca18c7

ls_hash:DjangoDesCrypt

format: crypt${salt}${checksum}

regex: crypt\$[a-f0-9]+\$[./A-Za-z0-9]{13}

example: crypt$cd1a4$cdlRbNJGImptk

ls_hash:GrubPbkdf2Sha512

format: grub.pbkdf2.sha512.{rounds}.{salt}.{checksum}

regex: grub.pbkdf2.sha512.\d+.[A-F0-9]+.[A-F0-9]{128}

example: grub.pbkdf2.sha512.10000.4483972AD2C52E1F590B3E2260795FDA9CA0B07B96FF492814CA9775F08C4B59CD1707F10B269E09B61B1E2D11729BCA8D62B7827B25B093EC58C4C1EAC23137.DF4FCB5DD91340D6D31E33423E4210AD47C7A4DF9FA16F401663BF288C20BF973530866178FE6D134256E4DBEFBD984B652332EED3ACAED834FEA7B73CAE851D

ERROR MATCHERS

These are pre-written regexes to match on common error messages.

Expand the error matchers list

ls_error:TypeError

looks for the case sensitive string TypeError

ls_error:Uncaught

looks for the case insensitive string uncaught

ls_error:SocketError

looks for the case sensitive string SocketError

ls_error:OperationNotSupported

looks for the case insensitive string operation not supported

ls_error:Callback

looks for the case insensitive string callback

ls_error:Segfault

looks for segmentation faults

regex: (?i)(SIGSEGV|segmentation fault( \(core dumped\))?|segmentation violation|access violation|illegal instruction (core dumped))

ls_error:RuntimeError

looks for the case insensitive string RuntimeError

ls_error:OutOfMemory

looks for out of memory errors

regex: memory allocation of \d+ bytes failed

ls_error:PermissionDenied

looks for the case insensitive string permission denied

ls_error:CommandNotFound

looks for the case insensitive string command not found

ls_error:JsUnknownArgument

looks for unknown argument errors thrown by JS

regex: Unknown argument `.+`. Available options are marked with

example: Unknown argument `provider_providerAccountId`. Available options are marked with ?

ls_error:JsInvalidInvocation

looks for invalid invocation arguments

regex: Invalid `.+` invocation in.+/.+\.js:\d+:\d+

example: Invalid `p.account.findUnique()` invocation in /Users/ASUS/outsidetest4/node_modules/@next-auth/prisma-adapter/dist/index.js:211:45

ls_error:JsBugMessage

looks for the case sensitive string This is caused by either a bug in Node.js or incorrect usage of Node.js internals.

ls_error:JsError

looks for javascript tracebacks

regex: (at .+ \((.+:\d+:\d+|<anonymous>)\)(\\n' \+)?(\s|')*)+

example:

at IncomingMessage._read (node:_http_incoming:214:19)
at Readable.read (node:internal/streams/readable:547:12)
at resume_ (node:internal/streams/readable:1048:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

ls_error:PyError

looks for python tracebacks

regex: Traceback \(most recent call last\):(\s+File ".+", line \d+, in .+\s*.*(\s+\^+)?)+(\s+.+)?

example:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 925, in create_channel
  return self._avail_channels.pop()
IndexError: pop from empty list

ls_error:JavaError

looks for java tracebacks

REGEX: (at .+\..+\(.+.(java|scala):\d+\)\s*)+

example:

at me.iwf.photopicker.adapter.PhotoGridAdapter.onBindViewHolder(PhotoGridAdapter.java:118)
at me.iwf.photopicker.adapter.PhotoGridAdapter.onBindViewHolder(PhotoGridAdapter.java:27)
at android.support.v7.widget.RecyclerView$Adapter.onBindViewHolder(RecyclerView.java:6673)

ls_error:RustError

looks for rust panics

regex: thread '.+' (panicked at '.+'.*, .+\.rs(:\d+)+|has overflowed its stack)

example: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', main.rs:2:47

ls_error:RubyError

looks for ruby tracebacks

regex: \.rb:\d+:in (`|').+'\s*((from )?\/.+\.rb:\d+:in (`|').+'\s*)*

example:

from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:677:in `connect_server'
from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:677:in `bind_connect'
from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:653:in `connect'

ls_error:GoError

looks for go tracebacks

regex: goroutine \d+ \[.+\]:\s+(.+\s+.+\/.+.go:\d+( \+0x[\da-f]+.*)?\s*)+

example:

goroutine 5844 [running]:
k8s.io/kubernetes/pkg/controller/statefulset.getPersistentVolumeClaims(0xc003ee8500, 0x0?)
        pkg/controller/statefulset/stateful_set_utils.go:348 +0x2fd
k8s.io/kubernetes/pkg/controller/statefulset.(*StatefulPodControl).createPersistentVolumeClaims(0xc000b24560, 0x6ecea2?, 0xc000600000?)
        pkg/controller/statefulset/stateful_pod_control.go:341 +0x6a

ls_error:PhpError

looks for fatal errors in php

regex: Fatal error:.+ in .*[\/\\].*\.php(.*\s*){1,2}Stack trace:\s+(#\d+ .+\s*)+

example:

Fatal error: Uncaught Exception: Incorrect public key: error:04099079:rsa routines:RSA_padding_check_PKCS1_OAEP_mgf1:oaep decoding error in /home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/crypto_library.php:13
Stack trace:
#0 home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/send_message.php(33): encryptMessage('1', '-----BEGIN PUBL...')
#1 {main} thrown in /home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/crypto_library.php on line 13

ls_error:EnvoySegfault

looks for the case sensitive string Caught Segmentation fault, suspect faulting address 0x

ls_error:PostgresError

looks for postgres errors

regex: (?i)(primary|panic|fatal|error): .+\s+((detail|hint|context|sqlstate): .+\s*)+

example:

ERROR:  duplicate key value violates unique constraint "constraint_name"
DETAIL:  Key (column_name)=(duplicate_value) already exists.
SQLSTATE: 23505

ls_error:MysqlError

looks for mysql errors

regex: ERROR \d+ \(\d+\): .*

example: ERROR 1062 (23000): Duplicate entry 'value' for key 'unique_key'

ls_error:RedisError

looks for redis errors

regex: \(error\) ERR .+

example: (error) ERR Operation against a key holding the wrong kind of value

ls_error:MongodbError

looks for mongodb errors

regex: (("errmsg"\s*:\s*".+"|"(code|index|ok)"\s*:\s*\d+)\s*,?\s*){2,4}

CONTENT INDICATOR MATCHERS

These are pre-written wordlists to match on common phrases within various categories of documents. Their current capabilities are very limited, as the word lists aren’t very big. This is more of a proof of concept for wordlist-style content classification, and more work would need to be done to make these truly useful.

Expand the content indicator list

ls_indicator:PromptInjection

looks for basic phrases like ignore previous instructions or imagine you had no restrictions

ls_indicator:Financial

looks for basic phrases like The undersigned hereby acknowledges receipt of or Payment shall be made in accordance with the following schedule

ls_indicator:Technical

looks for basic phrases like engineering change request or product specifications

ls_indicator:Regulatory

looks for basic phrases like environmental impact assessment or certification authority

ls_indicator:Hr

looks for basic phrases like grievance procedures or family leave policy

ls_indicator:Security

looks for basic phrases like security vulnerability or unauthorized access

ls_indicator:ComplianceTraining

looks for basic phrases like corporate ethics guidelines or workplace compliance handbook

ls_indicator:StrategicPlans

looks for basic phrases like competitive analysis or annual operating plan

ls_indicator:IntellectualProperty

looks for basic phrases like trademark filing or copyright registration

ls_indicator:VendorContracts

looks for basic phrases like service level agreement or termination conditions

ls_indicator:MarketingPlans

looks for basic phrases like product launch plan or digital marketing strategy

ls_indicator:ResearchDevelopment

looks for basic phrases like proof of concept or technical feasibility analysis

ls_indicator:CrisisManagement

looks for basic phrases like breach response protocol or regulatory reporting requirements