A GPU-accelerated speech-to-text service that types what you say, powered by OpenAI's Whisper AI.
- π€ Hands-free typing - Speak naturally and watch your words appear
- π GPU acceleration - Leverages NVIDIA GPUs for fast transcription
- π 99+ languages - Automatic language detection or manual selection
- π Smart noise filtering - Works well even with background noise
- ποΈ Visual indicators - See when the microphone is active
- β‘ Low latency - Optimized for real-time dictation
- π Automatic punctuation - Intelligently adds punctuation to your speech
- π Multiple model support - Choose between OpenAI Whisper and Faster Whisper
-
Clone the repository:
git clone https://github.com/sanastasiou/dictation-service.git cd dictation-service
-
Run the installer:
chmod +x install.sh ./install.sh
-
Follow the interactive prompts to:
- Select your microphone
- Choose a Whisper model
- Configure language settings
- Set up the services
- Operating System: Ubuntu/Debian-based Linux (tested on Ubuntu 20.04+)
- Python: 3.8 or higher
- Audio: PulseAudio (standard on most Linux desktops)
- GPU (optional): NVIDIA GPU with CUDA support for faster processing
- Disk Space: ~4GB for models
- RAM: 4GB minimum (8GB+ recommended for larger models)
# Start dictation service
dictation start
# Stop dictation service
dictation stop
# Toggle on/off
dictation toggle
# Check status
dictation status
# View logs
dictation logs
dictation logs -f # Follow mode
- Start the service: Run
dictation start
- Position your cursor: Click where you want text to appear
- Speak naturally: The service detects when you start speaking
- Pause to finish: Stop speaking for 1 second to end transcription
- Text appears: Your words are typed where your cursor is
The mic-monitor service shows a microphone icon in your system tray when:
- π’ Green: Service is running and ready
- π΄ Red: Actively recording your speech
The main configuration file is located at:
~/.config/dictation-service/config.json
{
"whisper_model": "large-v3-turbo", // Model to use
"language": null, // null for auto-detect
"silence_threshold": 0.02, // Voice detection sensitivity
"silence_duration": 1.0, // Seconds of silence to stop
"use_gpu": true, // Enable GPU acceleration
"use_faster_whisper": false // Use Faster Whisper implementation
}
Model | Size | Speed | Accuracy | Best For |
---|---|---|---|---|
tiny | 39M | β‘β‘β‘β‘β‘ | β | Quick notes, low-end systems |
base | 74M | β‘β‘β‘β‘ | ββ | Basic transcription |
small | 244M | β‘β‘β‘ | βββ | Good balance |
medium | 769M | β‘β‘ | ββββ | Better accuracy |
large-v3 | 1550M | β‘ | βββββ | Best accuracy |
large-v3-turbo | 809M | β‘β‘β‘ | βββββ | Recommended |
Set "language": null
for automatic detection, or use language codes:
"en"
- English"es"
- Spanish"fr"
- French"de"
- German"it"
- Italian"pt"
- Portuguese"zh"
- Chinese"ja"
- Japanese- Full list of language codes
-
Check service status:
dictation status
-
View logs for errors:
dictation logs
-
Test your microphone:
# Replace with your device from 'pactl list sources' parecord --device=your_device -v | aplay
No GPU detected:
- Check NVIDIA drivers:
nvidia-smi
- The service will fall back to CPU (slower but functional)
Poor transcription quality:
- Try a larger model: Edit config.json and set
"whisper_model": "large-v3"
- Adjust sensitivity: Lower
"silence_threshold"
for quiet environments - Check microphone quality and positioning
Service won't start:
- Ensure conda environment is activated:
conda activate whisper
- Check Python dependencies:
pip list | grep whisper
- Verify audio device exists:
pactl list sources
Text appears in wrong location:
- Click where you want text before speaking
- Some applications may not support simulated typing
For faster transcription:
- Use
"large-v3-turbo"
model (best balance) - Enable GPU:
"use_gpu": true
- Try Faster Whisper:
"use_faster_whisper": true
- Reduce beam size:
"beam_size": 1
For better accuracy:
- Use
"large-v3"
model - Increase beam size:
"beam_size": 10
- Set specific language:
"language": "en"
Edit ~/.config/mic-monitor/config.json
:
{
"monitor_device": "Your Device Name",
"monitor_all_devices": false
}
Edit ~/.config/dictation-service/config.json
:
{
"model_base_path": "/path/to/your/models",
"openai_model_path": "/custom/path/model.pt"
}
See docs/docker.md for containerized deployment.
dictation-service/
βββ src/
β βββ dictation-service.py # Main transcription service
β βββ mic-monitor.py # Microphone activity monitor
βββ bin/
β βββ dictation # Service control script
β βββ mic-monitor # Monitor control script
β βββ arcrecord # Audio recording wrapper
βββ config/
β βββ systemd/ # SystemD service files
β βββ *.json.default # Default configurations
βββ scripts/
β βββ download_whisper_models.sh # Model download utility
βββ docs/
βββ CONFIGURATION.md # Detailed configuration guide
# Clone repository
git clone https://github.com/sanastasiou/dictation-service.git
cd dictation-service
# Create conda environment
conda create -n whisper python=3.10
conda activate whisper
# Install dependencies
pip install -r requirements.txt
# Run tests
python -m pytest tests/
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Make your changes
- Run tests:
python -m pytest
- Submit a pull request
To completely remove the dictation service:
dictation-uninstall
This will remove:
- Service files and configurations
- Desktop shortcuts
- Log files
The following are preserved and must be removed manually:
- Whisper models (in
~/whisper-models
or your custom path) - Conda environment:
conda env remove -n whisper
- System packages (if you want to remove them)
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI Whisper for the amazing speech recognition models
- Faster Whisper for the optimized implementation
- The open-source community for various tools and libraries used
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Wiki: Project Wiki
Made with β€οΈ for the Linux community