EragAPI provides a unified FastAPI-based interface for multiple AI providers, offering seamless integration with Groq, Gemini, Cohere, DeepSeek, and Ollama. Features streaming responses, CLI tools, and a system tray icon for server management.
- Multi-Provider Support: Connect to Groq, Gemini, Cohere, DeepSeek, and Ollama.
- Streaming Responses: Server-Sent Events (SSE) for real-time outputs.
- Simple CLI: Start the server/list models with terminal commands.
- System Tray Integration: Background server management with tray icon.
- Ollama Compatibility: Works with locally hosted Ollama models.
- Python 3.9+
- Ollama (for Ollama integration)
pip install fastapi uvicorn pydantic python-dotenv google-generativeai cohere requests pystray pillow groq openai
Create a .env
file with your API keys:
GROQ_API_KEY="your_groq_key"
GEMINI_API_KEY="your_gemini_key"
CO_API_KEY="your_cohere_key"
DEEPSEEK_API_KEY="your_deepseek_key"
python eragAPI.py serve
Add --tray
to enable system tray icon.
# List all providers' models
python eragAPI.py model list
# Filter by provider
python eragAPI.py model list --api groq
POST /api/chat
{
"model": "groq-mixtral-8x7b-32768",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false,
"temperature": 0.7
}
POST /api/generate
{
"model": "gemini-gemini-pro",
"prompt": "Write a poem about AI",
"stream": true,
"temperature": 0.5
}
curl -X POST http://localhost:11436/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "groq-mixtral-8x7b-32768",
"messages": [{"role": "user", "content": "Explain quantum computing"}]
}'
import requests
response = requests.post(
"http://localhost:11436/api/generate",
json={
"model": "ollama-llama2",
"prompt": "Tell me a joke",
"stream": True
},
stream=True
)
for line in response.iter_lines():
if line:
print(line.decode('utf-8').replace('data: ', ''))
Format: {provider}-{model_name}
Examples:
groq-mixtral-8x7b-32768
ollama-llama2-uncensored
deepseek-deepseek-chat
- Ollama models must be pulled first (e.g.,
ollama pull llama2
). - Default models are used if no specific model is provided.
- Temperature range:
0.0
(deterministic) to1.0
(creative). - Stream responses for long-generation tasks to avoid timeouts.
MIT - Add appropriate license file.