examples : add stereo to mono conversion in read_audio_data #3266

danbev · 2025-06-18T13:48:30Z

This commit adds a conversion from stereo to mono in the read_audio_data function of common-whisper.cpp.

The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in pcmf32s.

The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues when transcribing stereo audio files.

For example, currently using the audio sample in the linked issue the output is:

[00:00:00.000 --> 00:00:03.000]  (speaker 1) Sous-titres réalisés para la communauté d'Amara.org

And with the change in this commit the output is:

[00:00:00.000 --> 00:00:01.500]  (speaker 1) *sonnerie de téléphone*
[00:00:01.500 --> 00:00:07.000]  (speaker 1) Salut jeune homme !
[00:00:07.000 --> 00:00:08.500]  (speaker 0) C'est vrai que je te dérange ?
[00:00:08.500 --> 00:00:10.500]  (speaker 1) Ah pas du tout, pas du tout, pas du tout !
[00:00:10.500 --> 00:00:12.500]  (speaker 1) J'étais en train de...
[00:00:12.500 --> 00:00:14.500]  (speaker 1) de préparer un courrier

Resolves: #3092

Notes/writeup: diarize-issue

This commit adds a conversion from stereo to mono in the `read_audio_data` function of `common-whisper.cpp`. The motivation for this change is prior to Commit 7d3da68 ("examples : use miniaudio for direct decoding flac, mp3, ogg and wav (ggml-org#2759)", there was a step that read stereo int16 data -> pcm16 (448512 samples), and then converted to mono (224256 samples), and then also convert to stereo in `pcmf32s. The middle step here seems to have been missed when rewriting the code to use Miniaudio and caused issues then transcribing stereo audio files. For example, currently using the audio sample in the linked issue the output is: ```console [00:00:00.000 --> 00:00:03.000] (speaker 1) Sous-titres réalisés para la communauté d'Amara.org ``` And with the change in this commit the output is: ``` [00:00:00.000 --> 00:00:01.500] (speaker 1) *sonnerie de téléphone* [00:00:01.500 --> 00:00:07.000] (speaker 1) Salut jeune homme ! [00:00:07.000 --> 00:00:08.500] (speaker 0) C'est vrai que je te dérange ? [00:00:08.500 --> 00:00:10.500] (speaker 1) Ah pas du tout, pas du tout, pas du tout ! [00:00:10.500 --> 00:00:12.500] (speaker 1) J'étais en train de... [00:00:12.500 --> 00:00:14.500] (speaker 1) de préparer un courrier ``` Resolves: ggml-org#3092

danbev · 2025-06-18T14:23:03Z

This might also be related to #3263 though I've not looked into it specifically.

danbev requested a review from ggerganov June 18, 2025 14:20

ggerganov approved these changes Jun 18, 2025

View reviewed changes

danbev merged commit ecb8f3c into ggml-org:master Jun 18, 2025
52 of 54 checks passed

danbev mentioned this pull request Jun 19, 2025

Only recognizing one channel in stereo recordings #3263

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

examples : add stereo to mono conversion in read_audio_data #3266

examples : add stereo to mono conversion in read_audio_data #3266

Uh oh!

danbev commented Jun 18, 2025 •

edited

Loading

Uh oh!

danbev commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

examples : add stereo to mono conversion in read_audio_data #3266

examples : add stereo to mono conversion in read_audio_data #3266

Uh oh!

Conversation

danbev commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danbev commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

danbev commented Jun 18, 2025 •

edited

Loading