Skip to content

knil-sama/itm

Folders and files

NameName
Last commit message
Last commit date
Aug 2, 2023
Aug 2, 2023
Jul 24, 2023
Aug 4, 2023
Aug 2, 2023
Aug 4, 2023
Aug 4, 2023
Jun 23, 2019
Jul 21, 2023
Aug 4, 2023
Aug 4, 2023
Jul 24, 2023

Repository files navigation

itm

Image to metadata

Purpose

Create a workflow of image processing in python using various storage (cloud & database) and orchestrator (Ariflow)

Dags run every 5 minutes:

  • Generate random number of url
  • Download image (from an online random generator)
  • Compute MD5 of image (to use it as id)
  • Compute grayscale
  • Load result into mongodb (using MD5 to avoid duplicate)
  • Allow download of image by a REST API http://localhost:8000/image/<MD5>
  • Display number of image processed (fail/success) http://localhost:8000/monitoring

Usage

docker-compose up or docker compose up

then go to http://0.0.0.0:8080

Use default admin user with test to connect

Click on "ON" of "main_dag" to start the workflow

once the workflow complete you can use endpoint

http://localhost:8000/image/ and http://localhost:8000/monitoring

Process

Generate will generate a number of image ranging from 1 to 1000, Download will load locally all url generated, then 2 parallel jobs will process this batch, the result of both will update an "event" that will be converted into the final "image" model. and a last job will update monitoring collection.

Loading
  graph TD;
      generate_urls-->download;
      load_image-->update_monitoring
      download-->grayscale;
      download-->hash;
      download-->load_image;
      grayscale-->load_image
      hash-->load_image
      download-->update_monitoring;

Input parameter

  • dags/main_dag.schedule_interval => can lower frequency

Debug

http://localhost:8081/ for admin GUI of mongodb

http://localhost:5000/images for a list of existing md5

Resources

https://dzone.com/articles/running-apache-airflow-dag-with-docker

https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html

Releases

No releases published

Packages

No packages published