-
Notifications
You must be signed in to change notification settings - Fork 41
MLOps
Machine learning (ML) workflows include steps to prepare and analyze data, train and evaluate models, deploy trained models to production, track ML artifacts and understand their dependencies, etc. Managing these steps in an ad-hoc manner can be difficult and time-consuming.
MLOps is the practice of applying DevOps practices to help automate, manage, and audit ML workflows. AI Platform Pipelines helps you implement MLOps by providing a platform where you can orchestrate the steps in your workflow as a pipeline. ML pipelines are portable and reproducible definitions of ML workflows. Kubeflow is a kind of MLOps tool.
AI Platform Pipelines makes it easier to get started with MLOps by saving you the difficulty of setting up Kubeflow Pipelines with TensorFlow Extended (TFX). Kubeflow Pipelines is an open-source platform for running, monitoring, auditing, and managing ML pipelines on Kubernetes. TFX is an open-source project for building ML pipelines that orchestrate end-to-end ML workflows.
https://cloud.google.com/architecture/mlops-intelligent-products-essentials
Kubeflow is the ML toolkit for Kubernetes.
Using the Kubeflow configuration interfaces you can specify the ML tools required for your workflow. Then you can deploy the workflow to various clouds.
MLflow is an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry.
MLflow Tracking is an API and UI for logging parameters, code versions, metrics and output files when running your machine learning code to later visualize them.
MLflow Projects provide a standard format for packaging reusable data science code. Each project is a directory with code or a Git repository.
MLflow Models is a convention for packaging machine learning models in multiple formats called “flavors”. MLflow offers a variety of tools to help you deploy different flavors of models.
The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations.
https://cloud.google.com/vertex-ai/docs/pipelines/introduction
https://ivannardini.medium.com/sparkling-vertex-ai-pipeline-cfe6e19334f7
TFX is an open source project that you can use to define your ML workflow as a pipeline. Currently, TFX components can only train TensorFlow based models. TFX provides components that you can use to ingest and transform data, train and evaluate a model, deploy a trained model for inference, etc. By using the TFX SDK, you can compose a pipeline for your ML process from TFX components.
Neptune is a metadata store for MLOps, built for research and production teams that run a lot of experiments.
It gives you a central place to log, store, display, organize, compare, and query all metadata generated during the machine learning lifecycle.
https://neptune.ai/blog/mlflow-vs-kubeflow-vs-neptune-differences
Pachyderm is a data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes, engineered for the enterprise. Other similar tools also exist to control an end-to-end machine learning life cycle.
DVC is built to make ML models shareable and reproducible. It is designed to handle large files, data sets, machine learning models, and metrics as well as code.
MLRun is an open-source MLOps framework that offers an integrative approach to managing your machine-learning pipelines from early development through model development to full pipeline deployment in production. MLRun offers a convenient abstraction layer to a wide variety of technology stacks while empowering data engineers and data scientists to define the feature and models.
Automating continuous integration (CI), continuous delivery (CD), and continuous training (CT) for machine learning (ML) systems.
https://cloud.google.com/blog/topics/developers-practitioners/model-training-cicd-system-part-i
https://medium.com/google-cloud/how-to-run-vertexai-custom-jobs-in-gitlab-ci-b986e6ebed89
https://medium.com/google-cloud/how-to-implement-ci-cd-for-your-vertex-ai-pipeline-27963bead8bd