Skip to content

[libbeat/logging] fix a breaking bug #44131

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 30, 2025

Conversation

VihasMakwana
Copy link
Contributor

@VihasMakwana VihasMakwana commented Apr 30, 2025

Proposed commit message

I think this is a breaking change that we need to address ASAP, hence tagging this PR as impact:critical.

This PR moves logging initialisation before we move forward with rest of the flow.

Issues:

  1. We initialise logger at

    b.Info.Logger, err = configure.LoggingWithTypedOutputsLocal(b.Info.Beat, b.Config.Logging, b.Config.EventLogging, logp.TypeKey, logp.EventType)
    if err != nil {
    return fmt.Errorf("error initializing logging: %w", err)
    }
    // extracting here for ease of use
    logger := b.Info.Logger

    But, we make use of logger in promoteOutputQueueSettings, which called is before logger's initialisation
    if err := promoteOutputQueueSettings(b); err != nil {
    return fmt.Errorf("could not promote output queue settings: %w", err)
    }

    b.Info.Logger.Info("global queue settings replaced with output queue settings")

    Logger doesn't exist at this point and it leads to a nil dereference panic.

  2. We don't really see the panic message. The command just exits without any meaningful info. We log the panic here, we make use of logp.NewLogger(...). It is a no-op logger if haven't configured the logger in the first place.

How to reproduce

Just run filebeat in normal mode with following config and you'll see nothing. No log files created, no panics printed:

filebeat.inputs:
  - type: filestream
    id: filestream-input-id
    enabled: true
    paths:
      - ./logs/file.log

output:
  elasticsearch:
    hosts: ["host"]
    username: elastic
    password: REDACTED
    bulk_max_size: 5000
    queue:
        mem:
          events: 3200
          flush:
            min_events: 1600
            timeout: 1s

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

  • it fixes a disruptive user impact

@VihasMakwana VihasMakwana added bug impact:critical Immediate priority; high value or cost to the product. labels Apr 30, 2025
@VihasMakwana VihasMakwana self-assigned this Apr 30, 2025
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Apr 30, 2025
@VihasMakwana VihasMakwana marked this pull request as ready for review April 30, 2025 09:15
@VihasMakwana VihasMakwana requested a review from a team as a code owner April 30, 2025 09:15
Copy link
Contributor

mergify bot commented Apr 30, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @VihasMakwana? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@VihasMakwana VihasMakwana requested review from rdner and removed request for khushijain21 April 30, 2025 09:16
@VihasMakwana VihasMakwana added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 30, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Apr 30, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@VihasMakwana
Copy link
Contributor Author

VihasMakwana commented Apr 30, 2025

This looks like a side effect of #43356. We haven't backported it to any of our branches, so we're good.
But the main branch fails.

Copy link
Member

@rdner rdner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we need a test that catches this to track the regression. But it can be introduced in a follow up PR.

Since it was never released, I don't think we need a changelog entry.

@AndersonQ
Copy link
Member

Also it'd good a test to ensure

if r := recover(); r != nil {
logp.NewLogger(settings.Name).Fatalw("Failed due to panic.",
"panic", r, zap.Stack("stack"))
}
works, or to remove it

@VihasMakwana
Copy link
Contributor Author

Since it was never released, I don't think we need a changelog entry.

Yeah, my biggest fear was if this change was baported to 8.x or 9.0. But, we're good.

@VihasMakwana
Copy link
Contributor Author

Also it'd good a test to ensure

if r := recover(); r != nil {
logp.NewLogger(settings.Name).Fatalw("Failed due to panic.",
"panic", r, zap.Stack("stack"))
}

works, or to remove it

@AndersonQ it works now. We see a panic message in log file.

@pierrehilbert
Copy link
Collaborator


=== FAIL: libbeat/tests/integration TestExportConfigEnvVar (10.00s)
--
  | framework.go:740:
  | Error Trace:	/opt/buildkite-agent/builds/bk-agent-prod-gcp-1746008355609022145/elastic/beats-libbeat/libbeat/tests/integration/framework.go:740
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1746008355609022145/elastic/beats-libbeat/libbeat/tests/integration/framework.go:754
  | /opt/buildkite-agent/builds/bk-agent-prod-gcp-1746008355609022145/elastic/beats-libbeat/libbeat/tests/integration/cmd_export_config_test.go:60
  | Error:      	Condition never satisfied
  | Test:       	TestExportConfigEnvVar
  | Messages:   	match string 'name: ${GOOS}' not found in /opt/buildkite-agent/builds/bk-agent-prod-gcp-1746008355609022145/elastic/beats-libbeat/libbeat/build/integration-tests/TestExportConfigEnvVar643642089/stdout
  | framework.go:330: [WARN] mockbeat did not stopped successfully: exit status 1

@khushijain21
Copy link
Contributor

khushijain21 commented Apr 30, 2025

@VihasMakwana , i think there is a catch here

cloudid.OverwriteSettings overrides default config. It is required that we extract raw config “after” calling this method so that we store the right config in b.RawConfig. Please correct me if I am wrong

@VihasMakwana
Copy link
Contributor Author

@VihasMakwana , i think there is a catch here

cloudid.OverwriteSettings overrides default config. It is required that we extract raw config “after” calling this method so that we store the right config in b.RawConfig. Please correct me if I am wrong

I see. Fixed it.

@VihasMakwana VihasMakwana merged commit 3604ffa into elastic:main Apr 30, 2025
194 checks passed
@khushijain21 khushijain21 added the backport-8.19 Automated backport to the 8.19 branch label Jun 6, 2025
mergify bot pushed a commit that referenced this pull request Jun 6, 2025
* fix a breaking bug

* remove panic

* fix log config

* changelog

* fix test

* no changelog

* fix OverwriteSettings

(cherry picked from commit 3604ffa)

# Conflicts:
#	libbeat/cmd/instance/beat.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.19 Automated backport to the 8.19 branch bug impact:critical Immediate priority; high value or cost to the product. Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants