Esperanto is a powerful Python library that provides a unified interface for interacting with various Large Language Model (LLM) providers. It simplifies the process of working with different AI models (LLMs, Embedders, Transcribers, and TTS) APIs by offering a consistent interface while maintaining provider-specific optimizations.
πͺΆ Ultra-Lightweight Architecture
- Direct HTTP Communication: All providers communicate directly via HTTP APIs using
httpx
- no bulky vendor SDKs required - Minimal Dependencies: Unlike LangChain and similar frameworks, Esperanto has a tiny footprint with zero overhead layers
- Production-Ready Performance: Direct API calls mean faster response times and lower memory usage
π True Provider Flexibility
- Standardized Responses: Switch between any provider (OpenAI β Anthropic β Google β etc.) without changing a single line of code
- Consistent Interface: Same methods, same response objects, same patterns across all 15+ providers
- Future-Proof: Add new providers or change existing ones without refactoring your application
β‘ Perfect for Production
- Prototyping to Production: Start experimenting and deploy the same code to production
- No Vendor Lock-in: Test different providers, optimize costs, and maintain flexibility
- Enterprise-Ready: Direct HTTP calls, standardized error handling, and comprehensive async support
Whether you're building a quick prototype or a production application serving millions of requests, Esperanto gives you the performance of direct API calls with the convenience of a unified interface.
- Unified Interface: Work with multiple LLM providers using a consistent API
- Provider Support:
- OpenAI (GPT-4o, o1, o3, o4, Whisper, TTS)
- OpenAI-Compatible (LM Studio, Ollama, vLLM, custom endpoints)
- Anthropic (Claude models)
- OpenRouter (Access to multiple models)
- xAI (Grok)
- Perplexity (Sonar models)
- Groq (Mixtral, Llama, Whisper)
- Google GenAI (Gemini LLM, Text To Speech, Embedding with native task optimization)
- Vertex AI (Google Cloud, LLM, Embedding, TTS)
- Ollama (Local deployment multiple models)
- Transformers (Universal local models - Qwen, CrossEncoder, BAAI, Jina, Mixedbread)
- ElevenLabs (Text-to-Speech, Speech-to-Text)
- Azure OpenAI (Chat, Embedding)
- Mistral (Mistral Large, Small, Embedding, etc.)
- DeepSeek (deepseek-chat)
- Voyage (Embeddings, Reranking)
- Jina (Advanced embedding models with task optimization, Reranking)
- Embedding Support: Multiple embedding providers for vector representations
- Reranking Support: Universal reranking interface for improving search relevance
- Speech-to-Text Support: Transcribe audio using multiple providers
- Text-to-Speech Support: Generate speech using multiple providers
- Async Support: Both synchronous and asynchronous API calls
- Streaming: Support for streaming responses
- Structured Output: JSON output formatting (where supported)
- LangChain Integration: Easy conversion to LangChain chat models
For detailed information about our providers, check out:
- LLM Providers Documentation
- Embedding Providers Documentation
- Reranking Providers Documentation
- Speech-to-Text Providers Documentation
- Text-to-Speech Providers Documentation
Install Esperanto using pip:
pip install esperanto
Transformers Provider
If you plan to use the transformers provider, install with the transformers extra:
pip install "esperanto[transformers]"
This installs:
transformers
- Core Hugging Face librarytorch
- PyTorch frameworktokenizers
- Fast tokenizationsentence-transformers
- CrossEncoder supportscikit-learn
- Advanced embedding featuresnumpy
- Numerical computations
LangChain Integration
If you plan to use any of the .to_langchain()
methods, you need to install the correct LangChain SDKs manually:
# Core LangChain dependencies (required)
pip install "langchain>=0.3.8,<0.4.0" "langchain-core>=0.3.29,<0.4.0"
# Provider-specific LangChain packages (install only what you need)
pip install "langchain-openai>=0.2.9"
pip install "langchain-anthropic>=0.3.0"
pip install "langchain-google-genai>=2.1.2"
pip install "langchain-ollama>=0.2.0"
pip install "langchain-groq>=0.2.1"
pip install "langchain_mistralai>=0.2.1"
pip install "langchain_deepseek>=0.1.3"
pip install "langchain-google-vertexai>=2.0.24"
Provider | LLM Support | Embedding Support | Reranking Support | Speech-to-Text | Text-to-Speech | JSON Mode |
---|---|---|---|---|---|---|
OpenAI | β | β | β | β | β | β |
OpenAI-Compatible | β | β | β | β | β | |
Anthropic | β | β | β | β | β | β |
Groq | β | β | β | β | β | β |
Google (GenAI) | β | β | β | β | β | β |
Vertex AI | β | β | β | β | β | β |
Ollama | β | β | β | β | β | β |
Perplexity | β | β | β | β | β | β |
Transformers | β | β | β | β | β | β |
ElevenLabs | β | β | β | β | β | β |
Azure OpenAI | β | β | β | β | β | β |
Mistral | β | β | β | β | β | β |
DeepSeek | β | β | β | β | β | β |
Voyage | β | β | β | β | β | β |
Jina | β | β | β | β | β | β |
xAI | β | β | β | β | β | β |
OpenRouter | β | β | β | β | β | β |
*
You can use Esperanto in two ways: directly with provider-specific classes or through the AI Factory.
The AI Factory provides a convenient way to create model instances and discover available providers:
from esperanto.factory import AIFactory
# Get available providers for each model type
providers = AIFactory.get_available_providers()
print(providers)
# Output:
# {
# 'language': ['openai', 'openai-compatible', 'anthropic', 'google', 'groq', 'ollama', 'openrouter', 'xai', 'perplexity', 'azure', 'mistral', 'deepseek'],
# 'embedding': ['openai', 'google', 'ollama', 'vertex', 'transformers', 'voyage', 'mistral', 'azure', 'jina'],
# 'reranker': ['jina', 'voyage', 'transformers'],
# 'speech_to_text': ['openai', 'groq', 'elevenlabs'],
# 'text_to_speech': ['openai', 'elevenlabs', 'google', 'vertex', 'openai-compatible']
# }
# Create model instances
model = AIFactory.create_language(
"openai",
"gpt-3.5-turbo",
config={"structured": {"type": "json"}}
) # Language model
embedder = AIFactory.create_embedding("openai", "text-embedding-3-small") # Embedding model
reranker = AIFactory.create_reranker("transformers", "cross-encoder/ms-marco-MiniLM-L-6-v2") # Universal reranker model
transcriber = AIFactory.create_speech_to_text("openai", "whisper-1") # Speech-to-text model
speaker = AIFactory.create_text_to_speech("openai", "tts-1") # Text-to-speech model
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the capital of France?"},
]
response = model.chat_complete(messages)
# Create an embedding instance
texts = ["Hello, world!", "Another text"]
# Synchronous usage
embeddings = embedder.embed(texts)
# Async usage
embeddings = await embedder.aembed(texts)
Here's a simple example to get you started:
from esperanto.providers.llm.openai import OpenAILanguageModel
from esperanto.providers.llm.anthropic import AnthropicLanguageModel
# Initialize a provider with structured output
model = OpenAILanguageModel(
api_key="your-api-key",
model_name="gpt-4", # Optional, defaults to gpt-4
structured={"type": "json"} # Optional, for JSON output
)
# Simple chat completion
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "List three colors in JSON format"}
]
# Synchronous call
response = model.chat_complete(messages)
print(response.choices[0].message.content) # Will be in JSON format
# Async call
async def get_response():
response = await model.achat_complete(messages)
print(response.choices[0].message.content) # Will be in JSON format
All providers in Esperanto return standardized response objects, making it easy to work with different models without changing your code.
from esperanto.factory import AIFactory
model = AIFactory.create_language(
"openai",
"gpt-3.5-turbo",
config={"structured": {"type": "json"}}
)
messages = [{"role": "user", "content": "Hello!"}]
# All LLM responses follow this structure
response = model.chat_complete(messages)
print(response.choices[0].message.content) # The actual response text
print(response.choices[0].message.role) # 'assistant'
print(response.model) # The model used
print(response.usage.total_tokens) # Token usage information
print(response.content) # Shortcut for response.choices[0].message.content
# For streaming responses
for chunk in model.chat_complete(messages):
print(chunk.choices[0].delta.content, end="", flush=True)
# Async streaming
async for chunk in model.achat_complete(messages):
print(chunk.choices[0].delta.content, end="", flush=True)
from esperanto.factory import AIFactory
model = AIFactory.create_embedding("openai", "text-embedding-3-small")
texts = ["Hello, world!", "Another text"]
# All embedding responses follow this structure
response = model.embed(texts)
print(response.data[0].embedding) # Vector for first text
print(response.data[0].index) # Index of the text (0)
print(response.model) # The model used
print(response.usage.total_tokens) # Token usage information
from esperanto.factory import AIFactory
reranker = AIFactory.create_reranker("transformers", "BAAI/bge-reranker-base")
query = "What is machine learning?"
documents = [
"Machine learning is a subset of artificial intelligence.",
"The weather is nice today.",
"Python is a programming language used in ML."
]
# All reranking responses follow this structure
response = reranker.rerank(query, documents, top_k=2)
print(response.results[0].document) # Highest ranked document
print(response.results[0].relevance_score) # Normalized 0-1 relevance score
print(response.results[0].index) # Original document index
print(response.model) # The model used
Esperanto supports advanced task-aware embeddings that optimize vector representations for specific use cases. This works across all embedding providers through a universal interface:
from esperanto.factory import AIFactory
from esperanto.common_types.task_type import EmbeddingTaskType
# Task-optimized embeddings work with ANY provider
model = AIFactory.create_embedding(
provider="jina", # Also works with: "openai", "google", "transformers", etc.
model_name="jina-embeddings-v3",
config={
"task_type": EmbeddingTaskType.RETRIEVAL_QUERY, # Optimize for search queries
"late_chunking": True, # Better long-context handling
"output_dimensions": 512 # Control vector size
}
)
# Generate optimized embeddings
query = "What is machine learning?"
embeddings = model.embed([query])
Universal Task Types:
RETRIEVAL_QUERY
- Optimize for search queriesRETRIEVAL_DOCUMENT
- Optimize for document storageSIMILARITY
- General text similarityCLASSIFICATION
- Text classification tasksCLUSTERING
- Document clusteringCODE_RETRIEVAL
- Code search optimizationQUESTION_ANSWERING
- Optimize for Q&A tasksFACT_VERIFICATION
- Optimize for fact checking
Provider Support:
- Jina: Native API support for all features
- Google: Native task type translation to Gemini API
- OpenAI: Task optimization via intelligent text prefixes
- Transformers: Local emulation with task-specific processing
- Others: Graceful degradation with consistent interface
The standardized response objects ensure consistency across different providers, making it easy to:
- Switch between providers without changing your application code
- Handle responses in a uniform way
- Access common attributes like token usage and model information
from esperanto.providers.llm.openai import OpenAILanguageModel
model = OpenAILanguageModel(
api_key="your-api-key", # Or set OPENAI_API_KEY env var
model_name="gpt-4", # Optional
temperature=0.7, # Optional
max_tokens=850, # Optional
streaming=False, # Optional
top_p=0.9, # Optional
structured={"type": "json"}, # Optional, for JSON output
base_url=None, # Optional, for custom endpoint
organization=None # Optional, for org-specific API
)
Use any OpenAI-compatible endpoint (LM Studio, Ollama, vLLM, custom deployments) with the same interface:
from esperanto.factory import AIFactory
# Using factory config
model = AIFactory.create_language(
"openai-compatible",
"your-model-name", # Use any model name supported by your endpoint
config={
"base_url": "http://localhost:1234/v1", # Your endpoint URL (required)
"api_key": "your-api-key" # Your API key (optional)
}
)
# Or set environment variables
# OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
# OPENAI_COMPATIBLE_API_KEY=your-api-key # Optional for endpoints that don't require auth
model = AIFactory.create_language("openai-compatible", "your-model-name")
# Works with any OpenAI-compatible endpoint
messages = [{"role": "user", "content": "Hello!"}]
response = model.chat_complete(messages)
print(response.content)
# Streaming support
for chunk in model.chat_complete(messages, stream=True):
print(chunk.choices[0].delta.content, end="", flush=True)
Common Use Cases:
- LM Studio: Local model serving with GUI
- Ollama:
ollama serve
with OpenAI compatibility - vLLM: High-performance inference server
- Custom Deployments: Any server implementing OpenAI chat completions API
Features:
- β Streaming: Real-time response streaming
- β Pass-through Model Names: Use any model name your endpoint supports
- β Graceful Degradation: Automatically handles varying feature support
- β Error Handling: Clear error messages for troubleshooting
β οΈ JSON Mode: Depends on endpoint implementation
Perplexity uses an OpenAI-compatible API but includes additional parameters for controlling search behavior.
from esperanto.providers.llm.perplexity import PerplexityLanguageModel
model = PerplexityLanguageModel(
api_key="your-api-key", # Or set PERPLEXITY_API_KEY env var
model_name="llama-3-sonar-large-32k-online", # Recommended default
temperature=0.7, # Optional
max_tokens=850, # Optional
streaming=False, # Optional
top_p=0.9, # Optional
structured={"type": "json"}, # Optional, for JSON output
# Perplexity-specific parameters
search_domain_filter=["example.com", "-excluded.com"], # Optional, limit search domains
return_images=False, # Optional, include images in search results
return_related_questions=True, # Optional, return related questions
search_recency_filter="week", # Optional, filter search by time ('day', 'week', 'month', 'year')
web_search_options={"search_context_size": "high"} # Optional, control search context ('low', 'medium', 'high')
)
Enable streaming to receive responses token by token:
# Enable streaming
model = OpenAILanguageModel(api_key="your-api-key", streaming=True)
# Synchronous streaming
for chunk in model.chat_complete(messages):
print(chunk.choices[0].delta.content, end="", flush=True)
# Async streaming
async for chunk in model.achat_complete(messages):
print(chunk.choices[0].delta.content, end="", flush=True)
Request JSON-formatted responses (supported by OpenAI and some OpenRouter models):
model = OpenAILanguageModel(
api_key="your-api-key", # or use ENV
structured={"type": "json"}
)
messages = [
{"role": "user", "content": "List three European capitals as JSON"}
]
response = model.chat_complete(messages)
# Response will be in JSON format
Convert any provider to a LangChain chat model:
model = OpenAILanguageModel(api_key="your-api-key")
langchain_model = model.to_langchain()
# Use with LangChain
from langchain.chains import ConversationChain
chain = ConversationChain(llm=langchain_model)
You can find the documentation for Esperanto in the docs directory.
There is also a cool beginner's tutorial in the tutorial directory.
We welcome contributions! Please see our Contributing Guidelines for details on how to get started.
This project is licensed under the MIT License - see the LICENSE file for details.
- Clone the repository:
git clone https://github.com/lfnovo/esperanto.git
cd esperanto
- Install dependencies:
pip install -r requirements.txt
- Run tests:
pytest