Skip to content

[receiver/awscloudwatch] Inerrupted logs polling in case of deleted log group #38940

@mat-rumian

Description

@mat-rumian

Component(s)

receiver/awscloudwatch

What happened?

Description

The AWS CloudWatch Logs receiver (awscloudwatchreceiver) silently stops polling all auto-discovered log groups when one of the previously discovered log groups has been deleted from AWS. The deletion results in an AWS API error (ResourceNotFoundException) being returned. After this error occurs, the receiver stops fetching logs from all log groups without any additional error logging, despite the receiver continuing to run.

This appears to be due to the interaction between the polling logic and auto-discovery.

Steps to Reproduce

  1. Configure awscloudwatchreceiver with auto-discovery enabled:
receivers:
  awscloudwatch:
    logs:
      poll_interval: 1m
      max_events_per_request: 1000
      groups:
        autodiscover:
          limit: 1000
  1. Allow the receiver to auto-discover multiple log groups and successfully poll logs.
  2. Delete one of the previously auto-discovered log groups directly from AWS CloudWatch.
  3. Observe the collector logs.

Expected Result

  • An error should be logged for the deleted log group (ResourceNotFoundException).
  • The receiver continues polling logs from all other existing log groups unaffected.

Actual Result

The receiver logs an error for the deleted log group:

unable to retrieve logs from cloudwatch log group="<deleted-log-group-name>" error="ResourceNotFoundException: The specified log group does not exist."

After this error, polling completely stops for all auto-discovered log groups.
No further logs or errors are emitted, causing silent failure and no logs ingestion thereafter.
This behavior leads to a critical loss of log ingestion across all log groups without explicit notification or error.

Collector version

v0.108.0

Environment information

Environment

OpenTelemetry Collector configuration

receivers:
    awscloudwatch:
      logs:
        poll_interval: 1m
        max_events_per_request: 1000
        groups:
          autodiscover:
            limit: 1000

Log output

{
    "level": "error",
    "ts": 1738945823.892559,
    "caller": "awscloudwatchreceiver@v0.108.0/logs.go:213",
    "msg": "unable to retrieve logs from cloudwatch",
    "kind": "receiver",
    "name": "awscloudwatch",
    "data_type": "logs",
    "log group": "/aws/sumologic/kinesis-logs20230825103521949300000002",
    "error": "ResourceNotFoundException: The specified log group does not exist.",
    "stacktrace": "github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver.(*logsReceiver).pollForLogs\n\tgithub.aichem.org/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver@v0.108.0/logs.go:213\ngithub.aichem.org/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver.(*logsReceiver).poll\n\tgithub.aichem.org/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver@v0.108.0/logs.go:187\ngithub.aichem.org/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver.(*logsReceiver).startPolling\n\tgithub.aichem.org/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver@v0.108.0/logs.go:174"
}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions