Data security¶
Warning
You are using an EXPERIMENTAL processor! Experimental processors:
May have bugs or stability issues
May experience breaking API changes
May not produce the expected results
By using this experimental processor you acknowledge:
It should NOT be used in a production context
It is NOT covered under F5 support agreements
Some experiments are not successful - the functionality could be retired.
Before you begin¶
Follow the steps in the Install with Helm topic to run F5 AI Gateway.
Overview¶
The F5 data-security processor runs in the AI Gateway processors container. This processor detects and optionally redacts or blocks arbitrary sensitive data.
Processor details |
Supported |
---|---|
Yes |
|
No |
|
Base Memory Requirement |
100 MB |
Input stage |
Yes |
Response stage |
Yes |
Beginning |
|
Supported language(s) |
English |
Configuration¶
processors:
- name: data-security
type: external
config:
endpoint: https://aigw-labs-data-security.ai-gateway.svc.cluster.local #TODO: replace with actual when helm is ready
namespace: f5-processor-labs
version: 1
params:
experimental: true
modify: true
matchers:
- ssn
- us_address
- regex:
name: image_filename
value: "^\\w+\\.(gif|png|jpg|jpeg)$"
- regex:
name: date
value: "\\d{4}-\\d{2}-\\d{2}"
Parameters¶
Parameters |
Description |
Type |
Required |
Defaults |
Examples |
---|---|---|---|---|---|
|
This flag acts as an acknowledgement that you are using an experimental processor. The processor will not run unless this is set to |
boolean |
Yes |
|
|
|
A list of data security matchers to run. A complete list can be found here. If the list is empty, all matchers will be used. |
list |
No |
|
|
When reject
is set to true
, this processor will reject the request when sensitive data is detected. When modify
is set to true, this processor will replace the sensitive data with X’s. Regardless of mode, it will always add the matches to the sensitive-data
tag.
Matcher Structure¶
Matchers are defined like so:
- ssn
- credit_card
If a matcher has additional configuration options, then it will require a custom name to be specified:
- raw:
name: hello_world_matcher
value: "Hello World!"
Matcher Types¶
STANDARD MATCHERS¶
These are the base customizable matchers from the data-security engine
raw¶
A case sensitive string matcher
- raw:
name: match_test_string
value: test string
raw_insensitive¶
A case insensitive string matcher
- raw_insensitive:
name: match_test_string
value: Test String
regex¶
A regular expression (regex) matcher
- regex:
name: date
value: "\\d{4}-\\d{2}-\\d{2}"
INTERNAL MATCHERS¶
These are dedicated matchers that offer better performance and more complex checks than standard matchers.
routing_number¶
Matches on bank routing numbers.
credit_card¶
Matches on credit/debit card numbers. Supports almost every major bank, and requires the number to have a valid LUHN checksum.
int_phone¶
Matches on international phone numbers via Google’s libphonenumber library. Requires the country code to be specified beforehand (that is, +1
or +33
). Does not support IDD codes, does not support full RFC3966 syntax (like extensions).
national_phone¶
Matches on country specific phone numbers and performs extra verification. Takes a name, a regex for matching on a countries number format, and a country code for what additional country specific checks to perform.
- national_phone:
name: us_number
regex: "\\d{3}-\\d{3}-\\d{4}"
country: US
ssn¶
Matches on US Social Security Numbers.
iban¶
Matches on International Bank Account Numbers.
sql¶
Matches on SQL statements. To avoid false positives, extremely simple or benign statements are not considered a match.
vin¶
Matches on Vehicle Identification Numbers.
eui48¶
Matches on 48 bit MAC Addresses.
ipv4¶
Matches on IPv4 addresses.
- ipv4:
name: ipv4
value: []
Takes an optional list of sub-types that it can match against. If none are specified then matches against any address. Possible values are:
broadcast
documentation
link_local
loopback
multicast
private
unspecified
- ipv4:
name: ipv4_broadcast
value:
- broadcast
- private
ipv6¶
Matches on IPv6 addresses.
- ipv6:
name: ipv6
value: []
Takes an optional list of sub-types that it can match against. If none are specified then matches against any address. Possible values are:
loopback
multicast
unspecified
- ipv6:
name: ipv6_loopback
value:
- loopback
imei¶
Attempts to match against known International Mobile Equipment Identity values. It isn’t perfect, but should catch most devices.
name¶
Matches on common US names. Uses a list of the top 1000 most common first and last names.
us_address¶
Matches on US addresses. The more specific the address is, the more checks the parser is able to perform (that is, ensuring a ZIP code is within a state or a city is within a ZIP code).
PRE-CANNED REGEX MATCHERS¶
These are pre-written regular expressions to match on common data patterns.
ls_regex:Email¶
regex: (?-u:\b)[a-zA-Z0-9][a-zA-Z0-9_.+-]{0,}@[a-zA-Z0-9][a-zA-Z0-9-.]{0,}\.[a-zA-Z]{2,}(?-u:\b)
HASH MATCHERS¶
These are pre-written regexes to match on common hash types.
Expand the hash matcher list
ls_hash:Bcrypt¶
format: $2{X}${rounds}${salt}{checksum}
regex: \$2[axyb]?\$\d{2}\$[./A-Za-z0-9]{53}
example: $2b$12$GhvMmNVjRW29ulnudl.LbuAnUtN/LRfe1JsBm1Xu6LE3059z5Tr8m
ls_hash:Sha256Crypt¶
format: $5$rounds={rounds}${salt}${checksum}
regex: \$5\$(rounds=\d+\$)?[./0-9A-Za-z]{0,16}\$[./0-9A-Za-z]{43}
example: $5$rounds=80000$wnsT7Yr92oJoP28r$cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5
ls_hash:Sha512Crypt¶
format: $6$rounds={rounds}${salt}${checksum}
regex: \$6\$(rounds=\d+\$)?[./0-9A-Za-z]{0,16}\$[./0-9A-Za-z]{86}
example: $6$rounds=80000$wnsT7Yr92oJoP28r$cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5cKhJImk5mfuSKV9b3mumNzlbstFUplKtQXXMo4G6Ep5
ls_hash:Md5Crypt¶
format: $1${salt}${checksum}
regex: \$1\$[./A-Za-z0-9]{0,8}\$[./A-Za-z0-9]{22}
example: $1$5pZSV9va$azfrPr6af3Fc7dLblQXVa0
ls_hash:Sha1Crypt¶
format: $sha1${rounds}${salt}${checksum}
regex: \$sha1\$\d+\$[./0-9A-Za-z]{0,64}\$[./0-9A-Za-z]{28}
example: sha1$40000$jtNX3nZ2$hBNaIXkt4wBI2o5rsi8KejSjNqIq
ls_hash:SunMd5Crypt¶
format: $md5,rounds={rounds}${salt}$${checksum} OR $md5${salt}$${checksum}
regex: \$md5(,rounds=\d+)?\$[./A-Za-z0-9]{0,8}\$\$[./A-Za-z0-9]{22}
example: $md5,rounds=5000$GUBv0xjJ$$mSwgIswdjlTY0YxV7HBVm0
example: $md5$GUBv0xjJ$$mSwgIswdjlTY0YxV7HBVm0
ls_hash:Argon2¶
format: $argon2{X}$v={version}$m={memory},t={time},p={parallelism}${salt}${digest}
regex: \$argon2[id]{1,2}\$v=\d+\$m=\d+,t=\d+,p=\d+\$[+/=A-Za-z0-9]+\$[+/=A-Za-z0-9]+
example: $bcrypt-sha256$v=2,t=2b,r=12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2
example: $bcrypt-sha256$2b,12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2
ls_hash:BcryptSha256¶
format: $bcrypt-sha256$v={version},t={type},r={rounds}${salt}${digest} OR $bcrypt-sha256${type},{rounds}${salt}${digest}
regex: \$bcrypt-sha256\$(v=2,t=2b,r=\d+|2[ab],\d+)\$[./A-Za-z0-9]{22}\$[./A-Za-z0-9]{31}
example: $bcrypt-sha256$v=2,t=2b,r=12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2
example: $bcrypt-sha256$2b,12$n79VH.0Q2TMWmt3Oqt9uku$Kq4Noyk3094Y2QlB8NdRT8SvGiI4ft2
ls_hash:Phpass¶
format: $P${rounds}{salt}{checksum} OR $H${rounds}{salt}{checksum}
regex: \$[PH]\$[./A-Za-z0-9]{31}
example: $P$8ohUJ.1sdFw09/bMaAQPTGDNi2BIUt1
ls_hash:Pbkdf2Sha1¶
format: $pbkdf2${rounds}${salt}${checksum}
regex: \$pbkdf2\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{27}
example: $pbkdf2$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrU
ls_hash:Pbkdf2Sha256¶
format: $pbkdf2-sha256${rounds}${salt}${checksum}
regex: \$pbkdf2-sha256\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{43}
example: $pbkdf2-sha256$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD44
ls_hash:Pbkdf2Sha512¶
format: $pbkdf2-sha512${rounds}${salt}${checksum}
regex: \$pbkdf2-sha512\$\d+\$[./+A-Za-z0-9]+\$[./+A-Za-z0-9]{86}
example: $pbkdf2-sha512$6400$.6UI/S.nXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD4498jZicV16ODfEsEZeYPGHU3kbrUrvUEXOPimVSQDD44
ls_hash:Scram¶
format: $scram${rounds}${salt}${alg1}={digest1},{alg2}={digest2},...,
regex: \$scram\$\d+\$[./+A-Za-z0-9]+\$((md2|md5|sha-1|sha-224|sha-256|sha-384|sha-512|shake128|shake256)=[./+A-Za-z0-9]+,?)+
example: $scram$6400$.Z/znnNOKWUsBaCU$sha-1=cRseQyJpnuPGn3e6d6u6JdJWk.0,sha-256=5GcjEbRaUIIci1r6NAMdI9OPZbxl9S5CFR6la9CHXYc,sha-512=.DHbIm82ajXbFR196Y.9TtbsgzvGjbMeuWCtKve8TPjRMNoZK9EGyHQ6y0lW9OtWdHZrDZbBUhB9ou./VI2mlw
ls_hash:Scrypt¶
format: $scrypt$ln={logN},r={r},p={p}${salt}${checksum}
regex: \$scrypt\$ln=\d+,r=\d+,p=\d+\$[./+=A-Za-z0-9]+\$[./+=A-Za-z0-9]{43}
example: $scrypt$ln=16,r=8,p=1$aM15713r3Xsvxbi31lqr1Q$nFNh2CVHVjNldFVKDHDlm4CbdRSCdEBsjjJxD+iCs5E
ls_hash:AprMd5Crypt¶
format: $apr1${salt}${checksum}
regex: \$apr1\$[./A-Za-z0-9]{0,8}\$[./A-Za-z0-9]{22}
example: $apr1$5pZSV9va$azfrPr6af3Fc7dLblQXVa0
ls_hash:DlitzPbkdf2Sha1¶
format: $p5k2${rounds}${salt}${checksum}
regex: \$p5k2\$\d+\$[./A-Za-z0-9]+\$[./+A-Za-z0-9]{32}
example: $p5k2$2710$.pPqsEwHD7MiECU0$b8TQ5AMQemtlaSgegw5Je.JBE3QQhLbO
ls_hash:CtaPbkdf2Sha1¶
format: $p5k2${rounds}${salt}${checksum}
regex: \$p5k2\$\d+\$[./\-=_+A-Za-z0-9]+\$[./\-=_+A-Za-z0-9]{28}
example: $p5k2$2710$oX9ZZOcNgYoAsYL-8bqxKg==$AU2JLf2rNxWoZxWxRCluY0u6h6c=
ls_hash:Mssql2000¶
format: 0x0100{salt}{digest1}{digest2}
regex: 0x0100[A-F0-9]{88}
example: 0x0100200420C4988140FD3920894C3EDC188E94F428D57DAD5905F6CC1CBAF950CAD4C63F272B2C91E4DEEB5E6444
ls_hash:Mssql2005¶
format: 0x0100{salt}{digest1}
regex: 0x0100[A-F0-9]{48}
example: 0x01006ACDF9FF5D2E211B392EEF1175EFFE13B3A368CE2F94038B
ls_hash:Mysql41¶
format: *{checksum}
regex: \*[A-F0-9]{40}
example: *2470C0C06DEE42FD1618BB99005ADCA2EC9D1E19
ls_hash:PostgresMd5¶
format: md5{checksum}
regex: md5[a-fA-F0-9]{32}
example: md5a5bfc9e07964f8dddeb95fc584cd9655
ls_hash:Oracle11¶
format: S:{checksum}{salt}
regex: S:[a-fA-F0-9]{60}
example: S:4143053633E59B4992A8EA17D2FF542C9EDEB335C886EED9C80450C1B4E6
ls_hash:BsdNthash¶
format: $3$${checksum}
regex: \$3\$\$[a-fA-F0-9]{32}
example: $3$$8846f7eaee8fb117ad06bdd830b7586c
ls_hash:DjangoPbkdf2Sha1¶
format: pbkdf2${rounds}${salt}${checksum}
regex: pbkdf2\$\d+\$[A-Za-z0-9]+\$[+/=A-Za-z0-9]+
example: pbkdf2$6400$6UISnXIk8jcbdHx3Fhg$98jZicV16ODfEsEZeYPGHU3kbrU
ls_hash:DjangoPbkdf2Sha256¶
format: pbkdf2_sha256${rounds}${salt}${checksum}
regex: pbkdf2_sha256\$\d+\$[A-Za-z0-9]+\$[+/=A-Za-z0-9]+
example: pbkdf2_sha256$10000$s1w0UXDd00XB$+4ORmyvVWAQvoAEWlDgN34vlaJx1ZTZpa1pCSRey2Yk=
ls_hash:DjangoSaltedSha1¶
format: sha1${salt}${checksum}
regex: sha1\$[a-f0-9]+\$[a-f0-9]+
example: sha1$f8793$c4cd18eb02375a037885706d414d68d521ca18c7
ls_hash:DjangoSaltedMd5¶
format: md5${salt}${checksum}
regex: md5\$[a-f0-9]+\$[a-f0-9]+
example: md5$f8793$c4cd18eb02375a037885706d414d68d521ca18c7
ls_hash:DjangoDesCrypt¶
format: crypt${salt}${checksum}
regex: crypt\$[a-f0-9]+\$[./A-Za-z0-9]{13}
example: crypt$cd1a4$cdlRbNJGImptk
ls_hash:GrubPbkdf2Sha512¶
format: grub.pbkdf2.sha512.{rounds}.{salt}.{checksum}
regex: grub.pbkdf2.sha512.\d+.[A-F0-9]+.[A-F0-9]{128}
example: grub.pbkdf2.sha512.10000.4483972AD2C52E1F590B3E2260795FDA9CA0B07B96FF492814CA9775F08C4B59CD1707F10B269E09B61B1E2D11729BCA8D62B7827B25B093EC58C4C1EAC23137.DF4FCB5DD91340D6D31E33423E4210AD47C7A4DF9FA16F401663BF288C20BF973530866178FE6D134256E4DBEFBD984B652332EED3ACAED834FEA7B73CAE851D
ERROR MATCHERS¶
These are pre-written regexes to match on common error messages.
Expand the error matchers list
ls_error:TypeError¶
looks for the case sensitive string TypeError
ls_error:Uncaught¶
looks for the case insensitive string uncaught
ls_error:SocketError¶
looks for the case sensitive string SocketError
ls_error:OperationNotSupported¶
looks for the case insensitive string operation not supported
ls_error:Callback¶
looks for the case insensitive string callback
ls_error:Segfault¶
looks for segmentation faults
regex: (?i)(SIGSEGV|segmentation fault( \(core dumped\))?|segmentation violation|access violation|illegal instruction (core dumped))
ls_error:RuntimeError¶
looks for the case insensitive string RuntimeError
ls_error:OutOfMemory¶
looks for out of memory errors
regex: memory allocation of \d+ bytes failed
ls_error:PermissionDenied¶
looks for the case insensitive string permission denied
ls_error:CommandNotFound¶
looks for the case insensitive string command not found
ls_error:JsUnknownArgument¶
looks for unknown argument errors thrown by JS
regex: Unknown argument `.+`. Available options are marked with
example: Unknown argument `provider_providerAccountId`. Available options are marked with ?
ls_error:JsInvalidInvocation¶
looks for invalid invocation arguments
regex: Invalid `.+` invocation in.+/.+\.js:\d+:\d+
example: Invalid `p.account.findUnique()` invocation in /Users/ASUS/outsidetest4/node_modules/@next-auth/prisma-adapter/dist/index.js:211:45
ls_error:JsBugMessage¶
looks for the case sensitive string This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
ls_error:JsError¶
looks for javascript tracebacks
regex: (at .+ \((.+:\d+:\d+|<anonymous>)\)(\\n' \+)?(\s|')*)+
example:
at IncomingMessage._read (node:_http_incoming:214:19)
at Readable.read (node:internal/streams/readable:547:12)
at resume_ (node:internal/streams/readable:1048:12)
at process.processTicksAndRejections (node:internal/process/task_queues:82:21)
ls_error:PyError¶
looks for python tracebacks
regex: Traceback \(most recent call last\):(\s+File ".+", line \d+, in .+\s*.*(\s+\^+)?)+(\s+.+)?
example:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/kombu/transport/virtual/base.py", line 925, in create_channel
return self._avail_channels.pop()
IndexError: pop from empty list
ls_error:JavaError¶
looks for java tracebacks
REGEX: (at .+\..+\(.+.(java|scala):\d+\)\s*)+
example:
at me.iwf.photopicker.adapter.PhotoGridAdapter.onBindViewHolder(PhotoGridAdapter.java:118)
at me.iwf.photopicker.adapter.PhotoGridAdapter.onBindViewHolder(PhotoGridAdapter.java:27)
at android.support.v7.widget.RecyclerView$Adapter.onBindViewHolder(RecyclerView.java:6673)
ls_error:RustError¶
looks for rust panics
regex: thread '.+' (panicked at '.+'.*, .+\.rs(:\d+)+|has overflowed its stack)
example: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', main.rs:2:47
ls_error:RubyError¶
looks for ruby tracebacks
regex: \.rb:\d+:in (`|').+'\s*((from )?\/.+\.rb:\d+:in (`|').+'\s*)*
example:
from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:677:in `connect_server'
from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:677:in `bind_connect'
from /var/deploy/example/shared/bundle/ruby/2.3.0/gems/eventmachine-1.2.3/lib/eventmachine.rb:653:in `connect'
ls_error:GoError¶
looks for go tracebacks
regex: goroutine \d+ \[.+\]:\s+(.+\s+.+\/.+.go:\d+( \+0x[\da-f]+.*)?\s*)+
example:
goroutine 5844 [running]:
k8s.io/kubernetes/pkg/controller/statefulset.getPersistentVolumeClaims(0xc003ee8500, 0x0?)
pkg/controller/statefulset/stateful_set_utils.go:348 +0x2fd
k8s.io/kubernetes/pkg/controller/statefulset.(*StatefulPodControl).createPersistentVolumeClaims(0xc000b24560, 0x6ecea2?, 0xc000600000?)
pkg/controller/statefulset/stateful_pod_control.go:341 +0x6a
ls_error:PhpError¶
looks for fatal errors in php
regex: Fatal error:.+ in .*[\/\\].*\.php(.*\s*){1,2}Stack trace:\s+(#\d+ .+\s*)+
example:
Fatal error: Uncaught Exception: Incorrect public key: error:04099079:rsa routines:RSA_padding_check_PKCS1_OAEP_mgf1:oaep decoding error in /home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/crypto_library.php:13
Stack trace:
#0 home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/send_message.php(33): encryptMessage('1', '-----BEGIN PUBL...')
#1 {main} thrown in /home/apptestl/domains/apptestlab.pl/public_html/nextalk/zalogowano/crypto_library.php on line 13
ls_error:EnvoySegfault¶
looks for the case sensitive string Caught Segmentation fault, suspect faulting address 0x
ls_error:PostgresError¶
looks for postgres errors
regex: (?i)(primary|panic|fatal|error): .+\s+((detail|hint|context|sqlstate): .+\s*)+
example:
ERROR: duplicate key value violates unique constraint "constraint_name"
DETAIL: Key (column_name)=(duplicate_value) already exists.
SQLSTATE: 23505
ls_error:MysqlError¶
looks for mysql errors
regex: ERROR \d+ \(\d+\): .*
example: ERROR 1062 (23000): Duplicate entry 'value' for key 'unique_key'
ls_error:RedisError¶
looks for redis errors
regex: \(error\) ERR .+
example: (error) ERR Operation against a key holding the wrong kind of value
ls_error:MongodbError¶
looks for mongodb errors
regex: (("errmsg"\s*:\s*".+"|"(code|index|ok)"\s*:\s*\d+)\s*,?\s*){2,4}
CONTENT INDICATOR MATCHERS¶
These are pre-written wordlists to match on common phrases within various categories of documents. Their current capabilities are very limited, as the word lists aren’t very big. This is more of a proof of concept for wordlist-style content classification, and more work would need to be done to make these truly useful.
Expand the content indicator list
ls_indicator:PromptInjection¶
looks for basic phrases like ignore previous instructions
or imagine you had no restrictions
ls_indicator:Legal¶
looks for basic phrases like intellectual property right
or This Agreement is made and entered into
ls_indicator:Financial¶
looks for basic phrases like The undersigned hereby acknowledges receipt of
or Payment shall be made in accordance with the following schedule
ls_indicator:Technical¶
looks for basic phrases like engineering change request
or product specifications
ls_indicator:Regulatory¶
looks for basic phrases like environmental impact assessment
or certification authority
ls_indicator:Hr¶
looks for basic phrases like grievance procedures
or family leave policy
ls_indicator:Security¶
looks for basic phrases like security vulnerability
or unauthorized access
ls_indicator:ComplianceTraining¶
looks for basic phrases like corporate ethics guidelines
or workplace compliance handbook
ls_indicator:StrategicPlans¶
looks for basic phrases like competitive analysis
or annual operating plan
ls_indicator:IntellectualProperty¶
looks for basic phrases like trademark filing
or copyright registration
ls_indicator:VendorContracts¶
looks for basic phrases like service level agreement
or termination conditions
ls_indicator:MarketingPlans¶
looks for basic phrases like product launch plan
or digital marketing strategy
ls_indicator:ResearchDevelopment¶
looks for basic phrases like proof of concept
or technical feasibility analysis
ls_indicator:CrisisManagement¶
looks for basic phrases like breach response protocol
or regulatory reporting requirements