Welcome to our project where we build a comprehensive Image Analysis API using Microsoft Florence-2-large, Chainlit, and Docker. This API allows you to perform various image analysis tasks such as captioning, object detection, expression segmentation, and OCR.
We utilized the pre-trained Florence-2-large model from Hugging Face Transformers for image analysis. This powerful model has been trained on a vast dataset of images and can perform multiple tasks such as image captioning, object detection, and OCR.
- chainlit_app.py: This is the heart of our Chainlit application. It defines the message handler that processes uploaded images and generates responses using the Florence model.
- app/model.py: Contains the
ModelManager
class, responsible for loading and managing the Florence-2-large model. - app/utils.py: Contains utility functions for image drawing, plot boxes, polygons, and OCR boxes.
- logging_config.py: Manages detailed logging for the project.
- Dockerfile: Defines how our application is containerized, ensuring all dependencies are properly installed and the environment is consistent.
We defined several task prompts to leverage the Florence-2-large model's capabilities. These prompts include:
<CAPTION>
: Generates a simple caption for the image.<DETAILED_CAPTION>
: Provides a detailed description of the image.<OD>
: Detects and locates objects within the image.<OCR>
: Performs Optical Character Recognition.<CAPTION_TO_PHRASE_GROUNDING>
: Locates specific phrases or objects mentioned in the caption within the image.<DENSE_REGION_CAPTION>
: Generates captions for specific regions within the image.<REGION_PROPOSAL>
: Suggests regions of interest within the image without labeling them.<MORE_DETAILED_CAPTION>
: Generates a very comprehensive description of the image.<REFERRING_EXPRESSION_SEGMENTATION>
: Segments the image based on a textual description of a specific object or region.<REGION_TO_SEGMENTATION>
: Generates a segmentation mask for a specified region in the image.<OPEN_VOCABULARY_DETECTION>
: Detects objects in the image based on user-specified categories.<REGION_TO_CATEGORY>
: Classifies a specific region of the image into a category.<REGION_TO_DESCRIPTION>
: Generates a detailed description of a specific region in the image.<OCR_WITH_REGION>
: OCR on specific regions of the image.
-
Clone the repository:
git clone https://github.com/yourusername/MS-FLORENCE2.git cd MS-FLORENCE2
-
(Optional) - File now included in the repo - Create a .env file with the following content: env Copy code MODEL_ID=microsoft/Florence-2-large RATE_LIMIT=5
-
Build the Docker container: docker build -t florence2-image-analysis .
-
Run the Docker container: docker run --gpus all -p 8010:8010 florence2-image-analysis
Once the container is running, access the Chainlit interface at http://localhost:8010. Follow the instructions in the chat to perform various image analysis tasks.
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.