Understanding MlOps and MlOps Pipeline

Shabarish PILKUN RAVI
9 min readAug 20, 2022

--

In todays world, we have all the essential ingredients to build an efficient ML model, these are:

  • Large datasets.
  • Inexpensive on-demand compute resources.
  • New research papers published for Machine Learning (Computer Vision, Natural Language Processing, time-series prediction, etc).
  • Ability to accelerate Ml models by deploying it in various cloud platforms.

With these ingredients available, companies are heavily investing on ML and datascience teams to productionize their ML applications to add value to their businesses.

What is MlOps ?

MlOps is a compound term consisting of Machine learning (Ml) and Operations (Ops), the main objective of MlOps is to give the data-scientists and Operations team a way to productionize their ML application. Following the MlOps practices can help the users to continously build, train, test, deploy and monitor their ML applications.

DevOps VS MlOps

DevOps is a practice of developing and operating large-scale software systems. Following DevOps practice helps users develop high quality tested code, it helps in shortening development cycles, avoids software integration errors and helps in deploying dependable releases. To achieve these benefits one must introduce two concepts:

  • Continous Integration (CI)
  • Continous Delivery (CD)

To learn more about DevOps please refer to my article below.

Understanding DevOps and DevOps pipeline | by Shabarish PILKUN RAVI | Jul, 2022 | Medium

One of the major difference between a general software system and a ML system is that a ML system is data dependent, the quality of results produced depend on the quality of input data, and this good quality data must be continously fed to the ML system to continously train, evaluate and validate the ML models.
When it comes to Continous Integration, it is not just about building, testing and validating the code, but also testing and validating the data, data schemas and models.
When it comes to Continous Delivery, one does not just release, deploy and monitor a single software application, but one deploys an ML training pipeline that should automatically choose the best performing ML model and deploy it.
A property that is unique to ML systems is the Continous Training (CT), this consists of automatically retraining, evaluating and serving the ML models to be deployed.
Having understood MlOps, and its difference compared to DevOps, now let us delve into MlOps Pipeline.

MlOps Pipeline:

MlOps pipeline can be defined at 3 levels:
1 — Level 0: Manual Process
2 — Level 1: Machine Learning Pipeline automation
3 — Level 2: CI/CD pipeline automation

Level 0: Manual Process

The level 0, MlOps pipeline answers the following question:
How do I be able to productionize my ML application ?, I do not care about automation, I just need to deploy my ML model for a prediction service.
Let us understand the workflow while developing a machine learning application.
The team consists of data engineers, data scientists and operations engineers.
The first step in this manual process is the data extraction step, after which we clean the data, perform exploratory data analysis, build, train and eveluate the models.
Once the suitable model is selected this is handed over to the operations team by storing it in a particular folder/registry, this model is used to perform the predictions operations, with new incoming data the whole process is repeated again. The whole description above can be represented by the flow chart below.

MlOps: Level 0

Characteristics of the Level 0, MlOps pipeline.

  • Manual, interactive and a script-driven process: Every step right from Data preparation to prediction service is a manual step.
  • Disconnection between the data scientists and Operations Engineers: The trained model is uploaded in a model registry which is used by the operations engineers to deploy them into the API’s.
  • No CI/CD: Since the steps are manual, the testing is part of the notebooks that are used for model development, thus a version controlled notebook is present, thus there is no automations to achieve CI/CD.
  • No process monitoring: The process does not have a continous monitoring system to store feedback and further improve the model.

The MlOps Pipeline: Level 0, is good when you start experimenting or build a POC of your machine learning application, however there is a risk of the model to break when dealing with real world data that continously change with changing environments.

With Level 0, MlOps pipeline it is a challenge to continously retrain the model with new data, or even try out new models with more research papers being released. Having understood the level 0 MlOps pipeline, let us understand the MlOps Level 1 pipeline.

Level 1: Machine Learning Pipeline automation

The Level1 MlOps pipeline answers the following question:

I now have a ML model in production, but I want to be able to continously retrain it and make it adaptable to changes in business environment or data. So what tools are needed and how do I be able to continously train my ML model ?

The main objective of this pipeline is to achieve continous training of the ML model. When it comes to continous training one must be able to continously retrain the models using new data. This type of pipeline overcomes the disadvantage of the model failing in production.

MlOps Pipeline: Level 1

The different stages in the pipeline are numbered and explained below:

1 — [Orchestrated Experiment] The steps from data validation to model validation is orchestrated in an ML pipeline, thus one can perform rapid iterations of experiments, this also ensures better readiness to move to production.

2 — [Source Code] At the end of the orchestrated experiment, we have the source code for the pipeline, this source code is versioned and stored in a repository. This allows us to roll back to a previous version of the application in case something unexpected happens.

3 — [Pipeline Deployment] While exploratory data analysis is performed on notebooks, the source codes are modularized/containerized and stored in a version control repository. These modules are reproducable and is served in production.

4 — [Dev/Prod Environment Parity] One of the characteristics of this pipeline is that the dev and production environments are kept as identical, so the model will not fail in production.

5 — [Automated Pipeline] In level 0, a trained model is deployed as a prediction service to production. For level 1, you deploy the whole training pipeline, which automatically and recurrently runs to serve the trained model as the prediction service.

6 — [Model Serving] At the end of the automated pipeline step we have the machine learning model that is stored in a model registry, this trained model is used for the prediction service.

7 —[Performance Monitoring] Another important characteristic of this MlOps pipeline is the ability to continously monitor the performance of the model, any unexpected behavior/changes in the performance of the model activates the trigger, this trigger activates both the orchestrated experiment and the automated pipelines, thus there is a continous monitoring and correction of the ML model.

In addition to the above stages, this MlOps pipeline also contains the following:

Feature Store — This is an optional additional entity of the MlOPs pipeline. It is a centralized repository containing the feature sets of the ML models.

The feature stores has the characteristics as noted below:

  • It helps the data scientists discover and reuse the feature sets instead of re-creating them.
  • It avoids having similar feature sets with different definitions so as to not overfit the models.
  • Serve the latest features for a given prediction service.

ML Metadata Store — Metadata is the information about the data, in our case this relates to the pipelines, datasets and the models.

  • In this metadata store we have information about when a given pipeline was executed, parameters used, what model was selected.
  • Having a metadata store greatly helps in the debugging process if a pipeline is behaving unexpectedly.
  • It also helps in rolling back to a pipeline execution where the behavior was acceptable.

Having the different tools such as feature store, ML metadata store, triggers and performance monitoring helps us to efficiently retrain the models. This continous retraining helps in making the models adaptable to changes in data and business envoronments.

Having understood the MlOps Level 1 pipeline, now let us understand MlOps Level 2 pipeline for CI/CD automation.

Level 2: CI/CD automation

The Level2 MlOps pipeline answers the following question:

I now have a ML model in production, and I am able to continously retrain it and make it adaptable to changes in business environment, however now I want to be able to try out new ML models and be able to continously deploy it in production, how do I achieve this ?

The main objective of this type of pipeline is to bring reliable updates to production as quickly as possible. Thanks to automated code integration and deployment, this pipeline gives the opportunity to the datascientist to focus on newer models, feature engineering and hyper parameters tuning. Given below is a schematic representation of the pipeline.

MlOps Level 2 pipeline

The different stages of the MlOps level 2 pipeline are numbered and explained below:

1 — [Orchestrated Experiment] The whole Ml processes are automated and part of the Ml pipeline, this gives the datascientists to try out, test and validate new models.

2 — [Source code] The output of the orchestrated experiment is the MlOps pipeline code, this code is stored in a version controlled source repository.

3 — [Continous Integration] The MlOps pipeline from the source respository is then built continously everytime a new code is pushed into the source repository. The datascientists can run unit tests on the MlOps pipeline code. Once these tests are passed, it code is packaged into modules and is made ready for deployment.

4 —[Continous delivery] The module packages are now continously delivered into the target environment. One must note that similar to MlOps Level 1 pipeline, both the Dev and Staging environments are kept similar.

5 — [Automated Pipeline] The packages from the CD stage contains the code new model that the datascientist wishes to deploy, this model is again passed into the automated Ml pipeline of the staging/preproduction/prod environments (one can furthere include unit tests in this pipeline to measure the performance of the model), the output of this stage is a trained model which is stored in the model registry.

6 — [Continous Deployment] The trained model is now used for providing the prediction service.

7 — [Performance Monitoring] The performance of the new deployed model is continously monitored for its behavior with the new live data in production, any changes are immediately triggered which notifies the ML pipelines to be started for data analysis to indentify the anomalies depending on the rules set.

Thus in MlOps pipeline level 2, with CI and CD, a new model can be quickly productionized. With continous performance monitoring if a models performance is not satisfactory, it can also be easily rolled back and retrained.

To summarize productionizing an ML application doesnt mean just to integrate a ML model in an API, rather to be able to deploy complete ML pipelines that have the ability to automate retraining the ML model, or to quickly integrate new models into production. Having an MlOps pipeline helps you to adapt your ML models to changes in data and business environment.

One cannot easily move the MlOps pipeline from a lower level to a higher level, this happens rather gradually with slowly automating every stage in the pipeline.

References

1 — MLOps Pipeline with MLFlow, Seldon Core and Kubeflow | Ubuntu

2 — The Big Book of MLOps — Databricks

3 — MLOps: Continuous delivery and automation pipelines in machine learning | Cloud Architecture Center | Google Cloud

Other Articles:

If you are interested to learn about aws services please look at my other articles:

1 — Cloud Computing: An Overview. In this section, I will try to explain… | by Shabarish PILKUN RAVI | Towards AWS

2 — Amazon Web Services [AWS]: An Overview | by Shabarish PILKUN RAVI | Towards AWS

3 — Amazon Simple Storage Service S3, an Introduction | by Shabarish PILKUN RAVI | Towards AWS

4 — Route 53 — The highly available and scalable cloud Domain Name System (DNS) web service. | by Shabarish PILKUN RAVI | Towards AWS

5 — AWS architecture for processing real-time and batch processing data and dashboards | by Shabarish PILKUN RAVI | Jul, 2022 | Medium

Devops:

1 — Understanding DevOps and DevOps pipeline | by Shabarish PILKUN RAVI | Jul, 2022 | Medium

If you are interested in deep learning and computer vision, refer below articles:

1 — OpenCV Background Subtraction and Music in Background of Video | by Shabarish PILKUN RAVI | Medium

2 — Artificial Neural Networks Simplified: From Perceptrons to BackPropagation | by Shabarish PILKUN RAVI | The Startup | Medium

3 — Traditional Recurrent Neural Networks — Reinforcement Learning Part 1/3 | by Shabarish PILKUN RAVI | Towards AI

--

--

Shabarish PILKUN RAVI
Shabarish PILKUN RAVI

Written by Shabarish PILKUN RAVI

Hi, I am Shabarish Ravi, I write blogs on Data Science, Software Engineering, Cloud Computing topics. I enjoy cooking and reading articles on machine learning.

No responses yet