🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
-
Updated
Jul 29, 2025 - Python
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
GUI for a Vocal Remover that uses Deep Neural Networks.
A PyTorch-based Speech Toolkit
Speech recognition module for Python, supporting several engines and APIs, online and offline.
Code for the paper "Jukebox: A Generative Model for Music"
Automagically synchronize subtitles with video.
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
Picard is a cross-platform music tagger powered by the MusicBrainz database
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
Auto-Editor: Efficient media analysis and rendering
Noise supression using deep filtering
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Data manipulation and transformation for audio signal processing, powered by PyTorch
Cast macOS and Linux Audio/Video to your Google Cast and Sonos Devices
Add a description, image, and links to the audio topic page so that developers can more easily learn about it.
To associate your repository with the audio topic, visit your repo's landing page and select "manage topics."