Skip to content

dharllc/speech-to-code

Repository files navigation

Speech-to-Code

Speech-to-Code is a web application that leverages Large Language Models (LLMs) to convert spoken language into executable code. This project aims to streamline the code generation process by allowing developers to express their ideas verbally and have them translated into functional code.

πŸ“Έ Application Screenshots

🎯 Compose Your Prompt

Combine speech, repository files, and manual text input in our intuitive prompt composer:

Prompt Composer

πŸ’¬ Use LLM APIs Directly

Sidestep rate limits and outages by using LLM APIs direcrtly: Prompt UI

πŸ“‹ Review and Copy Code

Easily review and copy the generated code: Copy Code

βš™οΈ Manage System Prompts

Customize and organize your system prompts: System Prompts

✨ Key Features

🎯 Advanced Prompt Composer

  • Combine speech, repository files, and manual text input
  • Real-time audio visualization for voice input
  • Smart file suggestions based on context
  • Preview and edit prompts before submission

πŸ€– Multi-Model Support

  • Integration with multiple LLM providers (OpenAI, Anthropic)
  • Customizable model parameters
  • Cost tracking and display
  • Model-specific optimizations

πŸ“ Repository Integration

  • Interactive file viewer for repository navigation
  • Smart file combinations for context
  • File-based suggestions
  • Repository structure visualization

πŸ’¬ Chat Sessions Management

  • Persistent chat history with automatic saving
  • Session organization and management
  • Soft delete functionality for chat sessions
  • Automatic chat naming based on context
  • Local timestamp display for all messages

🎀 Transcription Management

  • Real-time speech-to-text conversion
  • Transcription editing and refinement
  • Voice input visualization

πŸ’‘ System Prompt Management

  • Create and edit system prompts
  • Organize prompts by category
  • Quick prompt selection
  • Version control for prompts

🎨 User Experience

  • Dark/Light mode toggle
  • Two-column layout for better workflow
  • Responsive design
  • Copy-to-clipboard functionality

βš™οΈ Advanced Settings

  • Environment variable management
  • Repository path configuration
  • API key management
  • Port configuration

πŸ“ Logging and History

  • Comprehensive session logging
  • Chat history persistence
  • Automatic log directory management
  • Easy access to past conversations

πŸš€ Getting Started

πŸ“‹ Prerequisites

Before you begin, ensure you have installed:

  • Node.js and npm (latest version)
  • Python (version 3.7 or later)
  • A Windows/Linux/Mac machine with command line access

πŸ”§ Installation

  1. Clone the repository

    git clone https://github.com/dharllc/speech-to-code.git
    cd speech-to-code
  2. Make the build script executable

    chmod +x build.sh
  3. Run the build script

    ./build.sh

    The script will:

    • Install necessary dependencies
    • Set up a Python virtual environment
    • Create .env files with placeholders
    • Create required directories for logs and chat sessions
    • Set appropriate permissions
  4. Configure Environment Variables Navigate to the Settings page to configure:

    • OpenAI API Key
    • Google API Key
    • Anthropic API Key
    • Repository Path

πŸš€ Running the Application

  1. Configure ports (optional) Edit config.json in the root directory:

    {
        "frontend": {
            "port": 3000 
        },
        "backend": {
            "port": 8000 
        }
    }
  2. Start the frontend

    cd frontend
    npm start
  3. Start the backend

    cd backend
    source venv/bin/activate
    uvicorn main:app --reload --log-level debug

Access the application at http://localhost:3000 🌐

πŸ“ Project Structure

speech-to-code/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ .env
β”‚   β”œβ”€β”€ main.py
β”‚   β”œβ”€β”€ llm_interaction.py
β”‚   β”œβ”€β”€ model_config.py
β”‚   β”œβ”€β”€ system_prompts.json
β”‚   β”œβ”€β”€ context_maps/
β”‚   └── utils/
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ public/
β”‚   └── src/
β”‚       β”œβ”€β”€ components/
β”‚       β”œβ”€β”€ services/
β”‚       └── config/
β”œβ”€β”€ logs/
β”‚   └── chat_sessions/      # Persistent chat history storage
└── README.md

πŸ”§ Troubleshooting

Common Installation Issues

Backend dependencies failing to install: If you encounter build errors with packages like grpcio, tiktoken, or tokenizers on macOS:

  1. Upgrade pip and build tools:

    cd backend
    source venv/bin/activate
    pip install --upgrade pip setuptools wheel
  2. Install problematic packages with precompiled wheels:

    pip install --only-binary=all grpcio tiktoken tokenizers
  3. Install remaining dependencies:

    pip install uvicorn fastapi python-dotenv openai anthropic google-generativeai

"uvicorn not found" error:

  • Ensure you've activated the virtual environment: source venv/bin/activate
  • Install uvicorn directly: pip install uvicorn

"REPO_PATH environment variable is not set" error:

  • Create a .env file in the backend/ directory
  • Add: REPO_PATH=/path/to/your/repositories
  • Example: REPO_PATH=/Users/username/Documents/GitHub

General Issues

If you encounter other issues:

  1. Verify API keys in Settings
  2. Check dependencies
  3. Ensure both servers are running
  4. Check the logs directory for detailed session logs
  5. Verify proper permissions on the logs directory

For detailed logs, check:

  • Console output of both servers
  • Session logs in logs/sessions/
  • Application logs for debugging

🀝 Contributing

We welcome contributions! Check our issues page for current tasks or suggest new features.

πŸ“¬ Feedback

Have suggestions? Email me at sachin@dharllc.com

πŸ“„ License

This project is licensed under the MIT License πŸ“œ

About

llm assisted development tools

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •