Skip to content

hguerrero/ai-voice-to-image-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Image Generation Demo

This is a React application that generates images from text prompts using OpenAI's DALL-E API. Built with a Kong-inspired design system, the app provides an enterprise-grade, intuitive interface for creating stunning AI-generated images with customizable options.

Features

  • Text-to-Image Generation: Generate images from descriptive text prompts using OpenAI's DALL-E 3
  • Voice Input: Record your voice and convert speech to text using OpenAI's Whisper API
  • Kong-Inspired Design: Professional, enterprise-grade UI with modern design patterns
  • Customizable Options: Choose image dimensions and quality settings
  • Responsive Design: Optimized for desktop, tablet, and mobile devices
  • Real-time Feedback: Elegant loading states and comprehensive error handling
  • AI-Enhanced Prompts: Displays how OpenAI interpreted and refined your original prompt
  • Professional Styling: Clean, modern interface with subtle animations and hover effects
  • Cross-Platform Voice: Works on desktop and mobile browsers with microphone support

Setup Instructions

1. Install Dependencies

npm install

2. Configure Environment Variables

  1. Get your OpenAI API key from OpenAI Platform
  2. Copy the .env.example file to .env:
    cp .env.example .env
  3. Edit the .env file and configure the following variables:
    # Required: Your OpenAI API key
    REACT_APP_OPENAI_API_KEY=sk-your-actual-api-key-here
    
    # Optional: Custom OpenAI API base URL (leave empty for default OpenAI endpoint)
    # REACT_APP_OPENAI_BASE_URL=http://localhost:8000/ai/images

Environment Variables

  • REACT_APP_OPENAI_API_KEY (Required): Your OpenAI API key for authentication
  • REACT_APP_OPENAI_BASE_URL (Optional): Custom base URL for OpenAI API calls. If not set, uses the default OpenAI endpoint. Useful for:
    • Local development with proxy servers
    • Custom API gateways or middleware
    • Testing with mock servers

3. Start the Development Server

npm start

The app will open in your browser at http://localhost:3000 (or another port if 3000 is busy).

Usage

Text Input

  1. Enter a Prompt: Describe the image you want to generate in the text area
  2. Choose Options: Select your preferred image dimensions and quality
  3. Generate: Click the "Generate Image" button
  4. View Results: The generated image will appear below with the AI-enhanced prompt

Voice Input

  1. Click Voice Input: Click the microphone button in the text area
  2. Grant Permission: Allow microphone access when prompted by your browser
  3. Start Recording: Click "Voice Input" to begin recording your description
  4. Speak Clearly: Describe your image idea clearly and naturally
  5. Stop Recording: Click "Stop Recording" when finished
  6. Auto-Transcription: Your speech will be automatically converted to text
  7. Generate: The transcribed text will populate the prompt field, then click "Generate Image"

Voice Input Tips

  • Speak clearly and at a normal pace
  • Use descriptive language for better image results
  • Minimize background noise for better transcription accuracy
  • Try again if the transcription isn't accurate

Example Prompts

  • "A futuristic cityscape with neon lights reflecting on wet streets, cyberpunk aesthetic, high contrast lighting"
  • "A serene mountain landscape at golden hour with misty valleys and dramatic cloud formations"
  • "An elegant minimalist workspace with clean lines, natural lighting, and modern technology"
  • "A vibrant abstract composition with flowing geometric patterns in blues and purples"
  • "A cozy coffee shop interior with warm lighting, exposed brick walls, and vintage furniture"

Important Notes

  • API Costs: Each image generation and voice transcription request costs credits from your OpenAI account
  • Rate Limits: OpenAI has rate limits on API usage for both DALL-E and Whisper
  • Browser Usage: This demo runs the OpenAI API directly in the browser for simplicity. In production, API calls should be made from a secure backend server
  • Environment Variables: Never commit your actual API key to version control
  • Microphone Permission: Voice input requires microphone access permission from your browser
  • Browser Compatibility: Voice input works on modern browsers that support MediaRecorder API
  • Privacy: Audio recordings are sent to OpenAI for transcription and are not stored locally

Design System

This application features a Kong-inspired design system with:

  • Modern Color Palette: Professional blues and purples with clean grays
  • Typography: Clean, readable fonts with proper hierarchy
  • Components: Card-based layouts with subtle shadows and rounded corners
  • Interactions: Smooth animations and hover effects
  • Responsive: Mobile-first design that scales beautifully across devices

Technologies Used

  • React 19 with TypeScript
  • OpenAI SDK for DALL-E 3 and Whisper integration
  • Web Audio API for voice recording
  • MediaRecorder API for audio capture
  • Kong-inspired CSS design system
  • Modern CSS3 with CSS custom properties
  • Create React App for development tooling

Available Scripts

In the project directory, you can run:

npm start

Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.

The page will reload if you make edits.
You will also see any lint errors in the console.

npm test

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.

npm run build

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

npm run eject

Note: this is a one-way operation. Once you eject, you can’t go back!

If you aren’t satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.

Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.

You don’t have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.

Learn More

You can learn more in the Create React App documentation.

To learn React, check out the React documentation.

About

AI-powered image generator with voice input. Built with React, OpenAI DALL-E 3, and Whisper APIs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published