GitHub - othneildrew/open-whisperer: AI Video Translator and Subtitler

Open Whisperer

AI Video Translator and Subtitler

Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

The motivation for this project was two reasons: (1) I'm trying to learn Spanish so this is a fun way to translate and learn from any video and (2) I wanted to get back into python development focusing on some form of AI/machine learning.

The project's initial scope has been severely limited to reach a quick MVP and get a working app that can be self-hosted and used as a tool right away.

Expect bugs & beware of gremlins!

Btw: The name "open whisperer" is a play on the main open source project that drives this project (OpenAI's whisper). I just chose the name to get started and kept it until now; it does not indeed to infringe on any copyrights or trademarks held by Open AI.

For the devs learning to code (I mean, we're all learning, but...), this is a mono-repo; if you're not familiar with this type of app, I came across a nice resource that explains the motivation behind it. It's the most extensive and helpful I've found.

Check it out here: https://monorepo.tools/

(back to top)

Built With

(back to top)

Getting Started

To get a local copy up and running follow these simple example steps.

Note: You will need about ~14GB available space for all the AI language models.

Prerequisites

Docker (recommended)

or

node@^22.14.0
python@^3.11
yarn@3.8.7

Docker Installation

Install Docker: https://www.docker.com/get-started

Create .env files (use the examples in both apps/python-server and apps/web-ui

cp apps/web-ui/.example.env.production.local apps/web-ui/.env.producation.local # used by docker
cp apps/web-ui/.example.env.production.local apps/web-ui/.env.development.local # used when running locally in dev (i.e.: npm run dev)
cp apps/python-server/.example.env apps/python-server/.env # used by docker

Run the containers
```
docker compose up --build -d
```
Verify the container is running
```
docker ps
```
Access the Web UI Visit http://localhost:3567
Enjoy

Manual Installation

A great deal of effort went into making sure it runs without issue on docker, just try it and report back if you run into any issues.

// TODO: write manual installation instructions. But in the meanwhile, do use venv

Activate the virtual environment

# For windows
venv/Scripts/activate

# macOS/Linux
venv/bin/activate

(back to top)

Usage

The app features a simple and easy to use interface that allows choosing from different languages to translate to and from.

Of course no AI model is 100% accurate so don't rely on this program where 100% accurate transcript/translations are required for your use.

The usage is self-explanatory and each button highlights green when you're ready to move to the next step.

The demo version has upload limits and may include other restrictions and/or scanning of media to comply with some local laws, it is recommended to use the self-hosted option via docker if you need higher limits.

Roadmap

This is a very rough outline of how I may go about adding new features; the roadmap somewhat follows the order of importance right now for my use case.

Please, do not depend on this project as it is not stable and development may be sporadic.

Feel free to clone the project and make your own changes.

PRs are always welcome, and I'll be happy to merge any that make sense with the general direction of the project.

MVP

Setup pipeline (extract audio, transcriptions, translation, muxing) and working app
Docker image for self hosted option
Easily maintainable and well-structured mono repo

V1

Ability to edit the transcript before applying it to video
Allow applying source language subtitles to video
Show list of previously generated .srt files to quickly reuse and/or download
Sync video & transcript when user clicks video it should sync both
Add wavesurfer audio visualizer to show events & subtitle timeline
Speaker diarization (recognize how many speakers spoke, when & which of either gender the speaker(s) are)
Event-based status reporting with background tasks

V2

Add voice cloning to overdub videos in translated language (support different accents, gender)
Support different style "templates" for subtitle styles
Edit placement of subtitles
Detect duplicate video sources
Support uploading audio only and generating transcript w/ option to output karaoke style blank video
Take advantage of hardware acceleration

Way into the future

More advance cropping/slicing and basic editing videos (to cut dead-space)
Support concurrent uploads (multiple videos/audio at the same time w/ status reporting)

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feat/amazing-feature-i-want-to-add)
Commit your Changes (git commit -m 'Add some amazing-feature-i-want-to-add')
Push to the Branch (git push origin feat/amazing-feature-i-want-to-add)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Othneil Drew - LinkedIn @othneildrew - codeguydrew@gmail.com

Project Link: https://github.com/othneildrew/open-whisperer

Website: https://othneildrew.com

(back to top)

Acknowledgments

After much research, I've come across these amazing set of tools/projects that made this one possible.

Big shout out to these amazing resources! Some aren't used yet, but these are more than likely what I will be using to implement other features on the roadmap.

Task	Tool	Notes
Audio Extraction	`ffmpeg`	Industry standard
Language Detection	Whisper	Detects and transcribes; use faster-whisper for speed
Multi-Speaker Diarization	pyannote-audio	Best diarization tool (offline support with Hugging Face model download)
Translation	argos-translate	Offline translation, install language pairs
Voice Synthesis (TTS)	Tortoise TTS, Coqui TTS	High quality, supports speaker cloning too
Subtitle Handling	`ffmpeg`, `srt`, `autosub`, or custom logic	SRT file generation and muxing
Muxing	`ffmpeg`	Add subtitles or TTS audio back to the video

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.idea		.idea
_dev		_dev
apps		apps
packages		packages
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
.yarnrc.yml		.yarnrc.yml
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open Whisperer

About The Project

Built With

Getting Started

Prerequisites

Docker Installation

Manual Installation

Usage

Roadmap

MVP

V1

V2

Way into the future

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases 1

Languages

License

othneildrew/open-whisperer

Folders and files

Latest commit

History

Repository files navigation

Open Whisperer

About The Project

Built With

Getting Started

Prerequisites

Docker Installation

Manual Installation

Usage

Roadmap

MVP

V1

V2

Way into the future

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages