Skip to content

add_session_metadata processs DB can grow to 20k+ entries, OOMing machine #42317

@fearful-symmetry

Description

@fearful-symmetry

We have at least one report of auditbeat OOMing a machine with the add_session_metadata processor: add_session_metadata

after a bit of tinkering, I can reproduce this with the following config:

- module: auditd
  # Load audit rules from separate files. Same format as audit.rules(7).
  audit_rule_files: [ '${path.config}/audit.rules.d/*.conf' ]
  audit_rules: |
    -a exit,always -F arch=b64 -F euid=0 -S execve -k rootact
    -a exit,always -F arch=b32 -F euid=0 -S execve -k rootact
    -a always,exit -F arch=b64 -S connect -F a2=16 -F success=1 -F key=network_connect_4
    -a always,exit -F arch=b64 -F exe=/bin/bash -F success=1 -S connect -k "remote_shell"
    -a always,exit -F arch=b64 -F exe=/usr/bin/bash -F success=1 -S connect -k "remote_shell"
    -a always,exit -F arch=b64 -S exit_group
    -a always,exit -F arch=b64 -S setsid
    -a always,exit -F arch=b64 -S execve,execveat -k exec

processors:
  - add_session_metadata:
      backend: "procfs"

I instrumented the processor to dump the entire process DB used by the hostfs provider, and just running some SSH commands in a loop is enough to get the DB up to 30k+ entries in a few minutes, before the reaper would clean them up. However, the process count sitting in the DB is still 12k+ after a few minutes. On hight-load systems, the real count is probably much higher.

I'm not entirely sure what's going on here, but there's a massive amount of log spam suggesting that there's something up with the PID values coming from auditd:

10:41:54 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ sudo grep -rn "get process info from proc" logs/ | wc -l
23433
10:49:03 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ sudo grep -rn "could not insert exit" logs/ | wc -l
4576

The majority of the processes in the database are also missing metadata, suggesting they're processes that failed a PID lookup:

10:53:57 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ cat /tmp/procdb.json | jq -c '.[] | .Argv' | wc -l
12051
10:57:15 alexk@motmot auditbeat-8.17.1-linux-x86_64 ±|8.17 ✗|→ cat /tmp/procdb.json | jq -c '.[] | .Argv' | grep -v null | wc -l
262

I wonder if the values we expect to be PIDs/TGIDs at various points are just TIDs instead?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Auditbeatbugneeds_teamIndicates that the issue/PR needs a Team:* label

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions