Skip to content

engine: fix busy chunks on reload. #10720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CharlieR-o-o-t
Copy link
Contributor

@CharlieR-o-o-t CharlieR-o-o-t commented Aug 10, 2025

Changes contains:

  1. Simplify shutdown logic code (remove never called code from shutdown event handler). This code not needed anymore since shutdown event created once and exist till shutdown is finished.
  2. Make shutdown event creation safe.
  3. Stale busy chunks could appear because "engine FLB_ENGINE_STOP block" could be called twice.

Will completely fix: #10670.

Testing
Fluent-bit should be started with Hot_Reload and Ensure_Thread_Safety set to True or Retry_Limit set to no_limits.

🔽 Fluent-bit config
[Service]
    Hot_Reload.Ensure_Thread_Safety true
    Shutdown_Grace 5
    Log_Level trace 
    Http_Server    true
    Parsers_File    /home/raskin/kind/fluent-test/fluent/parsers.conf
    Parsers_File    /home/raskin/kind/fluent-test/fluent/parsers_multiline.conf
    storage.path    /home/raskin/kind/fluent-test/fluent/buffer/
    storage.sync    normal
    storage.checksum    off
    storage.backlog.mem_limit   5MB
    storage.metrics    on
    storage.max_chunks_up   5 
    storage.delete_irrecoverable_chunks    on
    storage.backlog.flush_on_shutdown Off

[Input]
    Name    tail
    Path    /home/raskin/kind/fluent-test/fluent/log/*.log
    Read_from_Head    true
    Refresh_Interval    60
    Skip_Long_Lines    true
    DB    /home/raskin/kind/fluent-test/fluent/tail/pos.db
    DB.Sync    Normal
    Mem_Buf_Limit    32MB
    Parser    cri
    Tag    kube.*
    storage.type    filesystem
    storage.pause_on_chunks_overlimit    off
    Log_Level trace

[Output]
    Name    loki
    Log_Level    trace
    Match_Regex    .*
    Retry_Limit   no_limits 
    host   localhost 
    port    3100
    auto_kubernetes_labels    off
    tls    Off

Tested on ubuntu:
debug.log

Documentation

  • [ N/A] Documentation required for this feature

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Refactor

    • Simplified and consolidated the shutdown/start-stop flow for smoother service termination and removed legacy exit logic.
    • Made shutdown timer creation conditional to avoid duplicate timers and added early-fail handling.
  • Bug Fix

    • Improved shutdown sequencing to reliably stop ingestion and flush tasks without premature exits.
  • Logging

    • Clarified and moved shutdown and retry logs to surface pending chunks and running tasks earlier.

Copy link

coderabbitai bot commented Aug 10, 2025

Walkthrough

Refactors shutdown/reload flow in src/flb_engine.c: removes local exiting flag, makes shutdown timer creation conditional on shutdown_fd, replaces explicit ingestion-stop toggles with flb_engine_stop_ingestion, and reorders logging for pending chunks/tasks and shutdown events.

Changes

Cohort / File(s) Change Summary
Engine shutdown & control-flow
src/flb_engine.c
Removed local exiting flag and old exit logic; replaced explicit is_ingestion_active/is_shutting_down toggles and input pausing with flb_engine_stop_ingestion(config); made shutdown timer creation conditional on config->shutdown_fd and added error handling for timer creation; adjusted placement of event->priority.
Event handling & timers
src/flb_engine.c
Switched shutdown fd consumption from flb_utils_pipe_byte_consume(fd) to flb_utils_timer_consume(fd); ensured shutdown event priority is set; retained conditional timer cleanup and placement changes.
Logging & reporting
src/flb_engine.c
Reorganized and moved logs: concise retry log message in flb_engine_reschedule_retries, earlier logging of pending memory/filesystem chunks and running tasks during FLB_ENGINE_SHUTDOWN; removed stray config = NULL and adjusted other log emission points.

Sequence Diagram(s)

sequenceDiagram
    participant Engine as Engine Main Loop
    participant Config as Engine Config
    participant Timer as Shutdown Timer

    Engine->>Engine: Receive FLB_ENGINE_STOP
    alt config->shutdown_fd <= 0
        Engine->>Timer: Create shutdown timer (with error handling)
        Timer-->>Engine: Timer created / error
    end
    Engine->>Engine: Continue event loop

    Engine->>Engine: Receive FLB_ENGINE_SHUTDOWN
    Engine->>Engine: Log pending memory/filesystem chunks (if any)
    Engine->>Engine: Log running tasks (if any)
    Engine->>Engine: Call flb_engine_stop_ingestion(config)
    Engine->>Engine: Proceed with shutdown/flush logic
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Assessment against linked issues

Objective Addressed Explanation
Prevent endless loop on reload and restore proper SIGHUP/SIGTERM handling (#10670)
Ensure Fluent Bit resumes log processing and responds to signals after reload (#10670)
Simplify and correct shutdown grace period logic to avoid hangs (#10670)

Possibly related PRs

Suggested labels

backport to v4.0.x

Suggested reviewers

  • edsiper
  • koleini
  • leonardo-albertovich

Poem

"I hopped through code to mend the flow,
Timers tamed where shutdowns go.
Retries whisper short and sweet,
Tasks logged tidy, no repeat.
Rabbit cheers — the engine's whole!" 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (2)
src/flb_engine.c (2)

137-142: Bug: logging uses uninitialized ‘retry’ pointer

In flb_engine_reschedule_retries(), the debug message prints retry before it’s assigned (it’s set later in the inner loop). This can dereference garbage.

-            if (task->users > 0) {
-                flb_debug("[engine] retry=%p for task %i already scheduled to run, "
-                          "not re-scheduling it.",
-                          retry, task->id);
-                continue;
-            }
+            if (task->users > 0) {
+                flb_debug("[engine] task_id=%i already scheduled to run, not re-scheduling it.",
+                          task->id);
+                continue;
+            }

584-586: Ensure correct timer fd consumption for shutdown events

The shutdown_fd is created with mk_event_timeout_create() (i.e. it’s a timerfd), so it should be drained with flb_utils_timer_consume(), not flb_utils_pipe_byte_consume(). Update in src/flb_engine.c:

  • At lines 583–586, replace:
        else if (config->shutdown_fd == fd) {
            flb_utils_pipe_byte_consume(fd);
            return FLB_ENGINE_SHUTDOWN;
        }
    with:
        else if (config->shutdown_fd == fd) {
            flb_utils_timer_consume(fd);
            return FLB_ENGINE_SHUTDOWN;
        }

This ensures we consume the timer event correctly (avoiding EINVAL on short reads).

🧹 Nitpick comments (3)
src/flb_engine.c (3)

1003-1008: Comment wording nit: tighten phrasing

Minor wording cleanups to improve clarity.

-                 * Will reschedule task to 1 sec. retry.
-                 * Also timer with shutdown event will be created.
+                 * Will reschedule tasks to a 1-second retry.
+                 * A shutdown timer event will also be created.

1072-1080: Throttle INFO logs for pending chunks/tasks to avoid log spam

With these moved earlier, they’ll print every second until shutdown completes. Consider throttling or lowering verbosity after the first tick.

-                    if ((mem_chunks + fs_chunks) > 0) {
-                        flb_info("[engine] pending chunk count: memory=%d, filesystem=%d; grace_timer=%d",
-                                 mem_chunks, fs_chunks, config->grace_count);
-                    }
-
-                    if (tasks > 0) {
-                        flb_task_running_print(config);
-                    }
+                    /* Log once per 10 seconds to reduce noise; still log immediately at first tick */
+                    if ((config->grace_count == 1) ||
+                        (config->grace_count % 10) == 0) {
+                        if ((mem_chunks + fs_chunks) > 0) {
+                            flb_info("[engine] pending chunk count: memory=%d, filesystem=%d; grace_timer=%d",
+                                     mem_chunks, fs_chunks, config->grace_count);
+                        }
+                        if (tasks > 0) {
+                            flb_task_running_print(config);
+                        }
+                    }

1111-1112: Superfluous local reset of config

Setting a local pointer to NULL right before return has no effect and can confuse readers.

-                        config = NULL;
-                        return ret;
+                        return ret;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4e62817 and a1e6030.

📒 Files selected for processing (1)
  • src/flb_engine.c (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: PR - fuzzing test

@CharlieR-o-o-t
Copy link
Contributor Author

@edsiper, could you please take a look?

@edsiper
Copy link
Member

edsiper commented Aug 13, 2025

some code logic code review:

1 Uninitialized Pointer Bug (Lines 137-142)

  • Code references retry variable before it's assigned
  • Can cause crashes due to garbage pointer dereferencing
  • Fix: Replace retry pointer reference with task ID in debug message

2 Incorrect Timer Consumption (Lines 584-586)

  • Uses flb_utils_pipe_byte_consume() on a timer fd
  • Should use flb_utils_timer_consume() for timer events
  • Can cause EINVAL errors on short reads

3 Missing Error Handling (Lines 1046-1051)

  • No validation when mk_event_timeout_create() fails
  • Silent failure could skip shutdown scheduling entirely
  • Fix: Add error checking and logging

Medium Priority Issues

  1. Missing Event Priority
  • Shutdown timer lacks explicit priority setting
  • Could affect event scheduling order
  • Fix: Set event->priority = FLB_ENGINE_PRIORITY_SHUTDOWN (to be validated)
  1. Minor Cleanup
  • Superfluous Code (Lines 1111-1112)
  • Setting local pointer to NULL before return has no effect
  • Fix: Remove unnecessary assignment

Recommendations

Before Merge:

  • Fix the uninitialized pointer bug (critical)
  • Correct timer fd consumption method (critical)
  • Add error handling for timer creation (critical)
  • Set proper event priority for shutdown timer

…logic on shutdown

Signed-off-by: Siarhei Rasiukevich <s.rasiukevich@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/flb_engine.c (2)

1029-1054: Guard + error handling for shutdown timer: great; set priority after create for consistency

The guarded creation and error handling for mk_event_timeout_create() look solid and prevent the double FLB_ENGINE_STOP issue. For consistency with the flush timer (set after creation at Line 899) and to avoid any chance of the event system overwriting fields during registration, set event->priority after a successful creation.

Apply this small relocation:

-    event->priority = FLB_ENGINE_PRIORITY_SHUTDOWN;
@@
-    if (config->shutdown_fd <= 0) {
-        config->shutdown_fd = mk_event_timeout_create(evl,
-                                                      1,
-                                                      0,
-                                                      event);
-
-        if (config->shutdown_fd == -1) {
-            flb_error("[engine] could not create shutdown timer");
-            /* fail early so we don't silently skip scheduled shutdown */
-            return -1;
-        }
-    }
+    if (config->shutdown_fd <= 0) {
+        config->shutdown_fd = mk_event_timeout_create(evl, 1, 0, event);
+        if (config->shutdown_fd == -1) {
+            flb_error("[engine] could not create shutdown timer");
+            /* fail early so we don't silently skip scheduled shutdown */
+            return -1;
+        }
+        event->priority = FLB_ENGINE_PRIORITY_SHUTDOWN;
+    }

1109-1114: Set shutdown_fd to an invalid value after destroying the timer

After mk_event_timeout_destroy(), reset shutdown_fd to an invalid sentinel to avoid accidental reuse on subsequent cycles or reloads.

     if (config->shutdown_fd > 0) {
         mk_event_timeout_destroy(config->evl,
                                  &config->event_shutdown);
+        config->shutdown_fd = -1;
     }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b82e8a5 and e14de4f.

📒 Files selected for processing (1)
  • src/flb_engine.c (7 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/flb_engine.c (3)
src/flb_utils.c (1)
  • flb_utils_timer_consume (492-511)
lib/monkey/mk_core/mk_event.c (1)
  • mk_event_timeout_create (156-163)
src/flb_task.c (1)
  • flb_task_running_print (299-345)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
  • GitHub Check: pr-compile-centos-7
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
  • GitHub Check: PR - fuzzing test
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
🔇 Additional comments (4)
src/flb_engine.c (4)

136-142: Good fix: avoid referencing uninitialized retry pointer

The new debug log only references task->id when task->users > 0, addressing the prior crash risk from logging an uninitialized retry pointer.


584-586: Correct API to consume timer events

Switching to flb_utils_timer_consume(fd) for shutdown_fd is the right fix; consuming a timerfd with pipe_byte_consume would lead to EINVAL on short reads.


1022-1025: Consolidated ingestion-stop logic

Using flb_engine_stop_ingestion(config) centralizes state toggling and input pausing, eliminating duplicated logic and reducing drift.


1075-1083: Helpful shutdown visibility

Emitting pending chunk counts and printing running tasks during shutdown provides actionable visibility for operators debugging grace periods and backlog behavior.

@CharlieR-o-o-t
Copy link
Contributor Author

CharlieR-o-o-t commented Aug 13, 2025

@edsiper , this code was here before I joined in - I only made minimal changes to fix the reload hang.
I hope to bring more value to the project in the future. Thanks a lot for your help!

I'v tested my case with reload on output down - all works like a charm. Could we also backport it to 4.0.x?

@edsiper
Copy link
Member

edsiper commented Aug 14, 2025

@coderabbitai check the new changes and run a new analysis

Copy link

coderabbitai bot commented Aug 14, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

I'll analyze the current changes in the PR to check if the previously identified issues have been addressed and provide a fresh analysis.


🏁 Script executed:

#!/bin/bash
# Get the current diff for the PR to see what changes are present
git diff HEAD~1 HEAD -- src/flb_engine.c

# Also get the full content of the main file being changed
echo "=== Full content of src/flb_engine.c ==="
cat src/flb_engine.c

Length of output: 42247


Let me examine the specific changes made in this PR and check if the issues identified by edsiper have been addressed.


🏁 Script executed:

#!/bin/bash
# Let's see what changes were made in this branch compared to main/master
git log --oneline -10

# Check current branch
git branch -a

# Let's see the diff from the base branch
git diff HEAD^ HEAD --name-only

# Get the actual diff for the engine file
git diff HEAD^ HEAD -- src/flb_engine.c || git diff origin/master HEAD -- src/flb_engine.c || git diff master HEAD -- src/flb_engine.c

Length of output: 8039


Now let me check the specific lines mentioned by edsiper to see if the issues have been addressed:


🏁 Script executed:

#!/bin/bash
# Check the specific lines mentioned by edsiper

echo "=== Issue 1: Lines around 137-142 (Uninitialized pointer bug) ==="
sed -n '130,150p' src/flb_engine.c

echo -e "\n=== Issue 2: Lines around 584-586 (Timer consumption) ==="
sed -n '580,590p' src/flb_engine.c

echo -e "\n=== Issue 3: Lines around 1046-1051 (Error handling for mk_event_timeout_create) ==="
sed -n '1040,1055p' src/flb_engine.c

echo -e "\n=== Issue 4: Check for event priority setting ==="
grep -n -A5 -B5 "FLB_ENGINE_PRIORITY_SHUTDOWN" src/flb_engine.c

echo -e "\n=== Issue 5: Lines around 1111-1112 (Superfluous code) ==="
sed -n '1108,1118p' src/flb_engine.c

Length of output: 4167


@edsiper I've analyzed the current changes in the PR and I'm pleased to report that all 5 issues you identified have been properly addressed:

✅ Critical Issues Fixed

1. Uninitialized Pointer Bug (Lines 137-142)

  • Status: FIXED
  • Change: Removed reference to uninitialized retry pointer in debug message
  • Before: flb_debug("[engine] retry=%p for task %i already scheduled..., retry, task->id);`
  • After: flb_debug("[engine] task %i already scheduled to run, not re-scheduling it.", task->id);

2. Incorrect Timer Consumption (Line 584)

  • Status: FIXED
  • Change: Correctly uses timer consumption function
  • Before: flb_utils_pipe_byte_consume(fd);
  • After: flb_utils_timer_consume(fd);

3. Missing Error Handling (Lines 1043-1053)

  • Status: FIXED
  • Changes: Added comprehensive error handling
    • Conditional timer creation: if (config->shutdown_fd <= 0)
    • Error check: if (config->shutdown_fd == -1) with proper logging
    • Fail-early approach with return -1 to prevent silent failures

✅ Medium Priority Issues Fixed

4. Missing Event Priority (Line 1032)

  • Status: FIXED
  • Change: event->priority = FLB_ENGINE_PRIORITY_SHUTDOWN; is now properly set

5. Superfluous Code Cleanup (Line 1114)

  • Status: FIXED
  • Change: Removed unnecessary config = NULL; assignment before return

Code Quality Assessment

The fixes demonstrate excellent engineering practices:

  • Minimal, targeted changes that address specific issues without over-engineering
  • Proper error handling with meaningful log messages and fail-early patterns
  • Improved debugging by removing confusing uninitialized pointer references
  • Correct API usage for timer file descriptor handling

The changes maintain the original intent of fixing busy chunks on reload while addressing all the safety and correctness issues you identified. This is exactly the kind of focused, high-quality fix that should be merged.

Recommendation: The code is now ready for merge. All critical and medium priority issues have been resolved appropriately.

@CharlieR-o-o-t
Copy link
Contributor Author

@edsiper , hello, could we proceed with PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 Endless loop on reload: Fluent Bit stops log processing and handling of SIGHUP/SIGTERM
2 participants