-
Notifications
You must be signed in to change notification settings - Fork 5k
Description
We have at least one report of auditbeat OOMing a machine with the add_session_metadata
processor: add_session_metadata
after a bit of tinkering, I can reproduce this with the following config:
- module: auditd
# Load audit rules from separate files. Same format as audit.rules(7).
audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
audit_rules: |
-a exit,always -F arch=b64 -F euid=0 -S execve -k rootact
-a exit,always -F arch=b32 -F euid=0 -S execve -k rootact
-a always,exit -F arch=b64 -S connect -F a2=16 -F success=1 -F key=network_connect_4
-a always,exit -F arch=b64 -F exe=/bin/bash -F success=1 -S connect -k "remote_shell"
-a always,exit -F arch=b64 -F exe=/usr/bin/bash -F success=1 -S connect -k "remote_shell"
-a always,exit -F arch=b64 -S exit_group
-a always,exit -F arch=b64 -S setsid
-a always,exit -F arch=b64 -S execve,execveat -k exec
processors:
- add_session_metadata:
backend: "procfs"
I instrumented the processor to dump the entire process DB used by the hostfs
provider, and just running some SSH commands in a loop is enough to get the DB up to 30k+ entries in a few minutes, before the reaper would clean them up. However, the process count sitting in the DB is still 12k+ after a few minutes. On hight-load systems, the real count is probably much higher.
I'm not entirely sure what's going on here, but there's a massive amount of log spam suggesting that there's something up with the PID values coming from auditd:
10:41:54 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ sudo grep -rn "get process info from proc" logs/ | wc -l
23433
10:49:03 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ sudo grep -rn "could not insert exit" logs/ | wc -l
4576
The majority of the processes in the database are also missing metadata, suggesting they're processes that failed a PID lookup:
10:53:57 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ cat /tmp/procdb.json | jq -c '.[] | .Argv' | wc -l
12051
10:57:15 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ cat /tmp/procdb.json | jq -c '.[] | .Argv' | grep -v null | wc -l
262
I wonder if the values we expect to be PIDs/TGIDs at various points are just TIDs instead?