AI Image Generation Demo

This is a React application that generates images from text prompts using OpenAI's DALL-E API. Built with a Kong-inspired design system, the app provides an enterprise-grade, intuitive interface for creating stunning AI-generated images with customizable options.

Features

Text-to-Image Generation: Generate images from descriptive text prompts using OpenAI's DALL-E 3
Voice Input: Record your voice and convert speech to text using OpenAI's Whisper API
Kong-Inspired Design: Professional, enterprise-grade UI with modern design patterns
Customizable Options: Choose image dimensions and quality settings
Responsive Design: Optimized for desktop, tablet, and mobile devices
Real-time Feedback: Elegant loading states and comprehensive error handling
AI-Enhanced Prompts: Displays how OpenAI interpreted and refined your original prompt
Professional Styling: Clean, modern interface with subtle animations and hover effects
Cross-Platform Voice: Works on desktop and mobile browsers with microphone support

Setup Instructions

1. Install Dependencies

npm install

2. Configure Environment Variables

Get your OpenAI API key from OpenAI Platform
Copy the .env.example file to .env:
```
cp .env.example .env
```

Edit the .env file and configure the following variables:

# Required: Your OpenAI API key
REACT_APP_OPENAI_API_KEY=sk-your-actual-api-key-here

# Optional: Custom OpenAI API base URL (leave empty for default OpenAI endpoint)
# REACT_APP_OPENAI_BASE_URL=http://localhost:8000/ai/images

Environment Variables

REACT_APP_OPENAI_API_KEY (Required): Your OpenAI API key for authentication
REACT_APP_OPENAI_BASE_URL (Optional): Custom base URL for OpenAI API calls. If not set, uses the default OpenAI endpoint. Useful for:
- Local development with proxy servers
- Custom API gateways or middleware
- Testing with mock servers

3. Start the Development Server

npm start

The app will open in your browser at http://localhost:3000 (or another port if 3000 is busy).

Usage

Text Input

Enter a Prompt: Describe the image you want to generate in the text area
Choose Options: Select your preferred image dimensions and quality
Generate: Click the "Generate Image" button
View Results: The generated image will appear below with the AI-enhanced prompt

Voice Input

Click Voice Input: Click the microphone button in the text area
Grant Permission: Allow microphone access when prompted by your browser
Start Recording: Click "Voice Input" to begin recording your description
Speak Clearly: Describe your image idea clearly and naturally
Stop Recording: Click "Stop Recording" when finished
Auto-Transcription: Your speech will be automatically converted to text
Generate: The transcribed text will populate the prompt field, then click "Generate Image"

Voice Input Tips

Speak clearly and at a normal pace
Use descriptive language for better image results
Minimize background noise for better transcription accuracy
Try again if the transcription isn't accurate

Example Prompts

"A futuristic cityscape with neon lights reflecting on wet streets, cyberpunk aesthetic, high contrast lighting"
"A serene mountain landscape at golden hour with misty valleys and dramatic cloud formations"
"An elegant minimalist workspace with clean lines, natural lighting, and modern technology"
"A vibrant abstract composition with flowing geometric patterns in blues and purples"
"A cozy coffee shop interior with warm lighting, exposed brick walls, and vintage furniture"

Important Notes

API Costs: Each image generation and voice transcription request costs credits from your OpenAI account
Rate Limits: OpenAI has rate limits on API usage for both DALL-E and Whisper
Browser Usage: This demo runs the OpenAI API directly in the browser for simplicity. In production, API calls should be made from a secure backend server
Environment Variables: Never commit your actual API key to version control
Microphone Permission: Voice input requires microphone access permission from your browser
Browser Compatibility: Voice input works on modern browsers that support MediaRecorder API
Privacy: Audio recordings are sent to OpenAI for transcription and are not stored locally

Design System

This application features a Kong-inspired design system with:

Modern Color Palette: Professional blues and purples with clean grays
Typography: Clean, readable fonts with proper hierarchy
Components: Card-based layouts with subtle shadows and rounded corners
Interactions: Smooth animations and hover effects
Responsive: Mobile-first design that scales beautifully across devices

Technologies Used

React 19 with TypeScript
OpenAI SDK for DALL-E 3 and Whisper integration
Web Audio API for voice recording
MediaRecorder API for audio capture
Kong-inspired CSS design system
Modern CSS3 with CSS custom properties
Create React App for development tooling

Available Scripts

In the project directory, you can run:

`npm start`

Runs the app in the development mode.
Open http://localhost:3000 to view it in the browser.

The page will reload if you make edits.
You will also see any lint errors in the console.

`npm test`

Launches the test runner in the interactive watch mode.
See the section about running tests for more information.

`npm run build`

Builds the app for production to the build folder.
It correctly bundles React in production mode and optimizes the build for the best performance.

The build is minified and the filenames include the hashes.
Your app is ready to be deployed!

See the section about deployment for more information.

`npm run eject`

Note: this is a one-way operation. Once you eject, you can’t go back!

If you aren’t satisfied with the build tool and configuration choices, you can eject at any time. This command will remove the single build dependency from your project.

Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except eject will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.

You don’t have to ever use eject. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.

Learn More

You can learn more in the Create React App documentation.

To learn React, check out the React documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Image Generation Demo

Features

Setup Instructions

1. Install Dependencies

2. Configure Environment Variables

Environment Variables

3. Start the Development Server

Usage

Text Input

Voice Input

Voice Input Tips

Example Prompts

Important Notes

Design System

Technologies Used

Available Scripts

`npm start`

`npm test`

`npm run build`

`npm run eject`

Learn More

About

Uh oh!

Releases

Languages

hguerrero/ai-voice-to-image-demo

Folders and files

Latest commit

History

Repository files navigation

AI Image Generation Demo

Features

Setup Instructions

1. Install Dependencies

2. Configure Environment Variables

Environment Variables

3. Start the Development Server

Usage

Text Input

Voice Input

Voice Input Tips

Example Prompts

Important Notes

Design System

Technologies Used

Available Scripts

npm start

npm test

npm run build

npm run eject

Learn More

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Languages

`npm start`

`npm test`

`npm run build`

`npm run eject`