MediBot: AI Doctor with Vision and Voice

Overview

MediBot is an AI-driven virtual medical assistant that leverages Vision-Language Models (VLLM), speech recognition, and text-to-speech synthesis to provide real-time medical insights. By integrating multimodal AI capabilities, MediBot enhances patient-doctor interactions, offering intelligent medical assistance through voice and image-based diagnostics.

This project aims to provide an accessible, AI-powered healthcare assistant that can analyze medical images, understand spoken symptoms, and generate human-like medical responses to assist patients and doctors in preliminary diagnostics.

Key Features

Multimodal AI Analysis: Uses VLLM (Vision-Language Learning Models), particularly LLaMA-3.2-11B-Vision, to process both medical images and text-based queries.
Speech Recognition: Converts spoken patient queries into text using Whisper-large-v3.
Text-to-Speech Synthesis: Generates lifelike responses using ElevenLabs or gTTS.
Advanced Natural Language Understanding: LLaMA models are optimized to generate contextually aware medical responses.
Interactive User Interface: Built with Gradio for an intuitive, seamless user experience.

Technologies Used

Machine Learning Framework: vLLM for low-latency inference.
Vision-Language Model (VLM): LLaMA-3.2-11B-Vision.
Speech-to-Text (STT): OpenAI Whisper-large-v3.
Text-to-Speech (TTS): ElevenLabs API, gTTS.
Image Processing: Base64 encoding for efficient transmission.
Frontend: Gradio-based Web UI.

Workflow Diagram

Below is a visual representation of how MediBot processes user inputs and generates medical insights:

Installation & Setup

1. Clone the repository

git clone https://github.com/rakibnsajib/MediBot-AI-Doctor-with-Vision-and-Voice.git
cd MediBot-AI-Doctor-with-Vision-and-Voice

2. Install Dependencies

pip install -r requirements.txt

3. Set Up API Keys

Create a .env file in the root directory and add your API keys:

GROQ_API_KEY=your-groq-api-key
ELEVENLABS_API_KEY=your-elevenlabs-api-key

Then, load the environment variables in your code using:

from dotenv import load_dotenv
import os

load_dotenv()
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
ELEVENLABS_API_KEY = os.getenv("ELEVENLABS_API_KEY")

4. Install FFmpeg and PortAudio

macOS

Install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install FFmpeg and PortAudio:
```
brew install ffmpeg portaudio
```

Linux

For Debian-based distributions (e.g., Ubuntu):

Update the package list
```
sudo apt update
```
Install FFmpeg and PortAudio:
```
sudo apt install ffmpeg portaudio19-dev
```

Windows

Download FFmpeg:

Visit the official FFmpeg download page: FFmpeg Downloads
Navigate to the Windows builds section and download the latest static build.

Extract and Set Up FFmpeg:

Extract the downloaded ZIP file to a folder (e.g., C:\ffmpeg).
Add the bin directory to your system's PATH:
- Search for "Environment Variables" in the Start menu.
- Click on "Edit the system environment variables."
- In the System Properties window, click on "Environment Variables."
- Under "System variables," select the "Path" variable and click "Edit."
- Click "New" and add the path to the bin directory (e.g., C:\ffmpeg\bin).
- Click "OK" to apply the changes.

Install PortAudio:

Download the PortAudio binaries from the official website: PortAudio Downloads
Follow the installation instructions provided on the website.

5. Run the Application

python main.py

Usage Guide

Upload an image of the affected area.
Record your voice describing the symptoms.
Receive AI-generated medical insights in text and voice format.
Interact with the AI Doctor via natural language queries powered by LLaMA-3.2-11B-Vision.

Future Enhancements

Expanded Medical Condition Coverage: Train models on broader datasets for more accurate diagnostics.
Integration with EHR Systems: Enable seamless data exchange with electronic health records.
Support for Multilingual Consultation: Improve accessibility for non-English speakers.
Edge AI Optimization: Optimize performance for low-power devices to enable remote diagnostics.
Personalized AI Assistance: Adapt responses based on patient history and past interactions.

Licensing & Compliance

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
LICENSE		LICENSE
README.md		README.md
brain_of_the_doctor.py		brain_of_the_doctor.py
main.py		main.py
requirements.txt		requirements.txt
voice_of_the_doctor.py		voice_of_the_doctor.py
voice_of_the_patient.py		voice_of_the_patient.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MediBot: AI Doctor with Vision and Voice

Overview

Key Features

Technologies Used

Workflow Diagram

Installation & Setup

1. Clone the repository

2. Install Dependencies

3. Set Up API Keys

4. Install FFmpeg and PortAudio

macOS

Linux

Windows

Download FFmpeg:

Extract and Set Up FFmpeg:

Install PortAudio:

5. Run the Application

Usage Guide

Future Enhancements

Licensing & Compliance

About

Releases

Packages

Languages

License

rakibnsajib/MediBot-AI-Doctor-with-Vision-and-Voice

Folders and files

Latest commit

History

Repository files navigation

MediBot: AI Doctor with Vision and Voice

Overview

Key Features

Technologies Used

Workflow Diagram

Installation & Setup

1. Clone the repository

2. Install Dependencies

3. Set Up API Keys

4. Install FFmpeg and PortAudio

macOS

Linux

Windows

Download FFmpeg:

Extract and Set Up FFmpeg:

Install PortAudio:

5. Run the Application

Usage Guide

Future Enhancements

Licensing & Compliance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages