Project

Centrico LiveLab — End-to-End MLOps

A complete machine learning operations pipeline that takes a model from initial data collection through training, deployment, and ongoing monitoring in production.

Data Engineering MLOps Pipeline Model Serving Prometheus + Grafana CI Quality Gates

Overview — For Recruiters and Non-Technical Stakeholders

This project demonstrates how to take machine learning models from development into production systems that can run reliably at scale.

What it does

Centrico LiveLab is an end-to-end machine learning operations (MLOps) pipeline. It handles everything from ingesting raw data, training predictive models, deploying those models as APIs that can serve predictions, and continuously monitoring system health and model performance.

What problem it solves

Most machine learning projects create models that work in isolated notebooks but never make it to production. Moving from a prototype to a reliable production system requires solving problems around data consistency, model versioning, deployment automation, performance monitoring, and quality control. This project addresses all of these concerns in a single integrated system.

What it demonstrates

This system shows proficiency in building production-grade machine learning infrastructure. It demonstrates the ability to design and implement complete data pipelines, orchestrate model training workflows, containerize and deploy services, set up monitoring systems, and automate quality checks. These are the core skills needed to operate machine learning systems in real-world production environments.

The complete system runs locally using Docker, making it verifiable without cloud infrastructure. All components are instrumented for observability, tested automatically, and documented for maintainability.

Technical Deep-Dive — For Engineers

Detailed architecture, implementation decisions, and technology choices for each layer of the system.

Purpose and Scope

This project implements a production-grade MLOps reference architecture demonstrating the complete machine learning lifecycle. The scope covers data ingestion through deployment and monitoring, with emphasis on reproducibility, observability, and automated quality control. The system is designed for local execution using Docker Compose, eliminating cloud dependencies while maintaining production-equivalent patterns.

Architecture Overview

The architecture follows a layered approach with clear separation of concerns. Each layer communicates through well-defined interfaces, enabling independent development and testing. The data layer handles ingestion and feature engineering. The training layer orchestrates model development with experiment tracking. The serving layer exposes trained models via REST API. The observability layer collects metrics and provides visualization. CI/CD pipelines enforce quality gates across all layers.

Data Flow and Execution Model

Data flows unidirectionally from sources through transformation pipelines into feature stores. Training processes read from feature stores to ensure consistency between training and inference. Trained models are serialized and versioned in a model registry. The inference service loads models from the registry and serves predictions via REST endpoints. Metrics flow from the API through Prometheus to Grafana dashboards. This design ensures reproducibility and enables rollback to any previous model version.

Implementation Details and Technology Stack

Component-by-component breakdown with technology choices and rationale.

1. Data Engineering Pipeline

Implementation: Python-based ETL pipelines with Pandas for data manipulation and Pydantic for schema validation. Data is ingested from multiple sources (CSV files, databases, REST APIs) and transformed into feature vectors. Schema enforcement occurs at ingestion boundaries to catch data quality issues early.

Technology choices: Pandas was chosen for its mature ecosystem and broad support for data formats. Pydantic provides runtime type checking and automatic validation, catching schema violations before they propagate through the pipeline. Feature stores are implemented using Parquet files for efficient columnar storage and fast read performance during training and inference.

Trade-offs: File-based feature stores sacrifice horizontal scalability for simplicity and zero-cost local operation. For production systems handling high throughput, this would be replaced with Feast or a similar dedicated feature store.

Python Pandas Pydantic Parquet

2. Training Pipeline and Model Management

Implementation: Scikit-learn for model training with MLflow for experiment tracking and model registry. Each training run logs hyperparameters, metrics, and model artifacts. Models are versioned and tagged (dev, staging, production) to control promotion through environments.

Technology choices: MLflow provides a complete experiment tracking solution with minimal configuration overhead. It integrates with scikit-learn's model serialization and provides a REST API for model retrieval. Scikit-learn was chosen for its simplicity and extensive documentation, though the architecture supports PyTorch or TensorFlow models through MLflow's model flavor abstraction.

Trade-offs: MLflow's file-based backend limits concurrent write performance but enables simple local deployment. Database-backed tracking would be necessary for teams with high training volumes.

Python Scikit-learn MLflow Model Registry

3. Inference API and Serving

Implementation: FastAPI serves predictions via REST endpoints with automatic OpenAPI documentation. The API loads models from MLflow's model registry at startup and caches them in memory. Request/response schemas are validated using Pydantic models. Health and readiness probes enable Kubernetes-style orchestration.

Technology choices: FastAPI provides automatic request validation, serialization, and interactive API documentation. Its async support enables high-throughput serving. Docker containerization ensures consistent deployment across environments. Gunicorn with Uvicorn workers provides production-grade process management and graceful shutdown handling.

Trade-offs: In-memory model caching requires sufficient RAM for large models. For production systems with memory constraints, models could be loaded on-demand or served through dedicated model servers like TensorFlow Serving or Triton.

Python FastAPI Pydantic Docker Gunicorn Uvicorn

4. Observability and Monitoring

Implementation: Prometheus client library instruments the FastAPI application with custom metrics (request count, latency histograms, error rates, prediction distributions). Prometheus scrapes these metrics at configured intervals. Grafana reads from Prometheus and renders dashboards showing system health, API performance, and model behavior over time.

Technology choices: Prometheus was chosen for its pull-based architecture, which simplifies network configuration and service discovery. Its time-series database handles high cardinality metrics efficiently. Grafana provides rich visualization capabilities and alerting integration. Both tools are industry standards with extensive community support.

Trade-offs: Prometheus's local storage limits retention periods. For long-term metric storage, integration with remote storage systems like Thanos or Cortex would be required.

Prometheus Grafana Prometheus Client Docker Compose

5. CI/CD and Quality Gates

Implementation: GitHub Actions workflows run on each commit, executing linters (flake8, black), type checkers (mypy), unit tests (pytest), and model evaluation on hold-out datasets. Quality gates fail the build if code quality metrics or model performance fall below thresholds. Successful builds can trigger automated deployments.

Technology choices: GitHub Actions provides tight integration with the repository and free compute for public repos. Pytest offers parametrized testing and extensive plugin ecosystem. Black and flake8 enforce consistent code style. Mypy catches type errors at CI time rather than runtime.

Trade-offs: GitHub Actions has limited compute resources for free tiers. Resource-intensive training jobs would require self-hosted runners or integration with cloud-based training platforms.

GitHub Actions Pytest Flake8 Black Mypy

6. Infrastructure and Orchestration

Implementation: Docker Compose orchestrates all services (API, MLflow, Prometheus, Grafana) with defined networking and volume mounts. Services communicate via Docker's internal DNS. Configuration is externalized through environment files.

Technology choices: Docker Compose provides declarative multi-container orchestration suitable for development and small-scale deployments. It requires no external dependencies beyond Docker itself. For production, this configuration could be translated to Kubernetes manifests or Helm charts.

Trade-offs: Docker Compose lacks advanced orchestration features (auto-scaling, rolling updates, service mesh). It is appropriate for development environments but would be replaced with Kubernetes for production deployments requiring high availability and horizontal scaling.

Docker Docker Compose Linux

Architecture diagram

High-level system flow showing data movement from sources through training and serving layers, with observability and CI/CD as cross-cutting concerns.

┌─────────────────────┐
│  Data Sources       │
│  (CSV, DB, APIs)    │
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│  Data Pipeline      │
│  • Ingestion        │
│  • Validation       │
│  • Feature Eng      │
└──────────┬──────────┘
           │
           ├──────────────────────┐
           │                      │
           ▼                      ▼
┌─────────────────────┐  ┌─────────────────────┐
│  Training Pipeline  │  │  Feature Store      │
│  • MLflow           │  │  (Inference Ready)  │
│  • Model Registry   │  └──────────┬──────────┘
└──────────┬──────────┘             │
           │                        │
           ▼                        ▼
┌─────────────────────┐  ┌─────────────────────┐
│  Model Artefacts    │  │  Inference API      │
│  (Versioned)        │─▶│  • FastAPI          │
└─────────────────────┘  │  • Health Checks    │
                         └──────────┬──────────┘
                                    │
                                    ▼
                         ┌─────────────────────┐
                         │  Monitoring Stack   │
                         │  • Prometheus       │
                         │  • Grafana          │
                         │  • Alerting         │
                         └─────────────────────┘

Cross-Cutting: CI/CD pipeline with quality gates at each stage
        

Technical Challenges and Solutions

Key engineering problems encountered and solutions implemented.

Data Consistency

Challenge: Ensuring training and inference use identical feature transformations.

Solution: Centralized feature engineering logic with Pydantic schemas enforced at pipeline boundaries. Feature generation code is versioned alongside models to maintain consistency.

Model Versioning

Challenge: Tracking which model version is deployed and enabling rollbacks.

Solution: MLflow model registry with explicit stage transitions (dev → staging → production). Each model artifact includes training metadata and lineage information.

Observability

Challenge: Detecting model performance degradation in production.

Solution: Comprehensive instrumentation capturing request patterns, latency distributions, and prediction statistics. Grafana dashboards provide real-time visibility into model behavior.

Quality Control

Challenge: Preventing poorly-performing models from reaching production.

Solution: Automated CI gates evaluating model performance on hold-out data. Builds fail if accuracy, precision, or recall fall below defined thresholds.

Local Setup and Verification

The system can be run entirely locally using Docker Compose, enabling developers to test and verify the complete stack without cloud dependencies.

Quick start

  1. Clone the repository: git clone https://github.com/nepryoon/centrico-livelab-mlops
  2. Run setup: docker-compose up
  3. Access services:
    • API docs: http://localhost:8000/docs
    • Grafana: http://localhost:3000
    • Prometheus: http://localhost:9090

Verification checklist

  • ✓ Data pipeline runs without errors
  • ✓ Training produces versioned model artefacts
  • ✓ Inference API responds to health checks
  • ✓ Prometheus collects metrics from API
  • ✓ Grafana dashboards display system metrics
  • ✓ CI pipeline passes all quality gates