MediBot is an AI-driven virtual medical assistant that leverages Vision-Language Models (VLLM), speech recognition, and text-to-speech synthesis to provide real-time medical insights. By integrating multimodal AI capabilities, MediBot enhances patient-doctor interactions, offering intelligent medical assistance through voice and image-based diagnostics.
This project aims to provide an accessible, AI-powered healthcare assistant that can analyze medical images, understand spoken symptoms, and generate human-like medical responses to assist patients and doctors in preliminary diagnostics.
- Multimodal AI Analysis: Uses VLLM (Vision-Language Learning Models), particularly LLaMA-3.2-11B-Vision, to process both medical images and text-based queries.
- Speech Recognition: Converts spoken patient queries into text using Whisper-large-v3.
- Text-to-Speech Synthesis: Generates lifelike responses using ElevenLabs or gTTS.
- Advanced Natural Language Understanding: LLaMA models are optimized to generate contextually aware medical responses.
- Interactive User Interface: Built with Gradio for an intuitive, seamless user experience.
- Machine Learning Framework: vLLM for low-latency inference.
- Vision-Language Model (VLM): LLaMA-3.2-11B-Vision.
- Speech-to-Text (STT): OpenAI Whisper-large-v3.
- Text-to-Speech (TTS): ElevenLabs API, gTTS.
- Image Processing: Base64 encoding for efficient transmission.
- Frontend: Gradio-based Web UI.
Below is a visual representation of how MediBot processes user inputs and generates medical insights:
git clone https://github.com/rakibnsajib/MediBot-AI-Doctor-with-Vision-and-Voice.git
cd MediBot-AI-Doctor-with-Vision-and-Voice
pip install -r requirements.txt
Create a .env
file in the root directory and add your API keys:
GROQ_API_KEY=your-groq-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key
Then, load the environment variables in your code using:
from dotenv import load_dotenv
import os
load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")
- Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install FFmpeg and PortAudio:
brew install ffmpeg portaudio
For Debian-based distributions (e.g., Ubuntu):
- Update the package list
sudo apt update
- Install FFmpeg and PortAudio:
sudo apt install ffmpeg portaudio19-dev
- Visit the official FFmpeg download page: FFmpeg Downloads
- Navigate to the Windows builds section and download the latest static build.
- Extract the downloaded ZIP file to a folder (e.g.,
C:\ffmpeg
). - Add the
bin
directory to your system's PATH:- Search for "Environment Variables" in the Start menu.
- Click on "Edit the system environment variables."
- In the System Properties window, click on "Environment Variables."
- Under "System variables," select the "Path" variable and click "Edit."
- Click "New" and add the path to the
bin
directory (e.g.,C:\ffmpeg\bin
). - Click "OK" to apply the changes.
- Download the PortAudio binaries from the official website: PortAudio Downloads
- Follow the installation instructions provided on the website.
python main.py
- Upload an image of the affected area.
- Record your voice describing the symptoms.
- Receive AI-generated medical insights in text and voice format.
- Interact with the AI Doctor via natural language queries powered by LLaMA-3.2-11B-Vision.
- Expanded Medical Condition Coverage: Train models on broader datasets for more accurate diagnostics.
- Integration with EHR Systems: Enable seamless data exchange with electronic health records.
- Support for Multilingual Consultation: Improve accessibility for non-English speakers.
- Edge AI Optimization: Optimize performance for low-power devices to enable remote diagnostics.
- Personalized AI Assistance: Adapt responses based on patient history and past interactions.
- This project is licensed under the MIT License.