Discover a flexible, open-source framework for creating real-time conversational AI using Whisper, Groq, and ElevenLabs. This easy-to-use toolkit enables developers to build voice-driven chatbots with a focus on simplicity and extensibility. Embrace the power of natural, spoken conversation and move beyond the constraints of text-based interfaces. This framework's straightforward, modular architecture allows developers of all skill levels to quickly create and customize their own conversational AI applications. By leveraging cutting-edge technologies and providing a clear, well-documented structure, this project aims to make voice-based AI accessible to a wider audience of developers.
- 🎙️ Real-time voice input and output for truly immersive conversational experiences
- 🧩 Modular architecture for easy integration of STT, LLM, and TTS components
- 🔧 Highly configurable and adaptable to your unique requirements
- 🌐 Includes a sample web interface for rapid deployment and testing
- 🎭 Customizable persona and conversational style through editable prompt templates
- 🌈 Compatible with various cutting-edge AI models and libraries (Whisper, Faster Whisper, Groq, Anthropic, ElevenLabs, Azure TTS, and more)
- Clone the repository:
git clone https://github.com/avonx/talk-to-ai.git
- Navigate to the project directory:
cd talk-to-ai
- Create a virtual environment:
python -m venv venv
- Install the required dependencies:
pip install -r requirements.txt
- Configure your API keys in the
config.yaml
file (refer toconfig_sample.yaml
for the required keys) - Customize your chatbot's persona and conversational style by editing the prompt template in
characters/
- Start the application:
bash run.sh
- Open your web browser and navigate to
http://localhost:5000
to converse with your real-time conversational AI!
This Real-time Conversational AI framework boasts a highly modular architecture, allowing you to mix and match components to suit your needs:
app.py
: The main Flask application that orchestrates the STT, LLM, and TTS componentsstt/
: Contains the speech-to-text module, supporting Whisper and Faster Whisperllm/
: Houses the language modeling module, accommodating Groq and Anthropictts/
: Includes the text-to-speech module, offering ElevenLabs, ElevenLabs Streaming, and Azure TTStemplates/
: Contains the sample HTML template for the web interfaceprompts/
: Stores the prompt templates for the LLM, defining your chatbot's persona and conversational style
We welcome contributions from the developer community to help expand the capabilities of this Real-time Conversational AI framework. Submit a pull request or open an issue on the GitHub repository to join the project and shape the future of conversational AI!
This project is licensed under the MIT License. See the LICENSE
file for more information.
This Real-time Conversational AI framework is built upon the remarkable achievements of developers and researchers in the fields of AI, speech recognition, and speech synthesis. We express our deepest gratitude to the creators of the various AI models and libraries that make this project possible, including Whisper, Groq, ElevenLabs, and more.
Embrace the power of voice and revolutionize the way we interact with machines using this Real-time Conversational AI framework! 🎙️✨