Reproducible ML Pipeline with Airflow CI/CD

Problem Statement

While experimenting with machine learning models — tuning hyperparameters with Bayesian methods, running cross-validations, and optimizing trials using Optuna with MLflow tracking — I found myself constantly fighting the same problem: code dependency and fragility.

Every new experiment required changing code, updating configurations, and risking breakages. Switching datasets or parameters meant tweaking multiple scripts, rerunning environment setups, and occasionally debugging things that weren’t even related to the experiment.

It became clear that I needed a stable, self-contained environment that could:

Handle ETL and ML pipeline orchestration automatically.
Let me experiment safely without touching core code.
Log and version results in MLflow.
Be reproducible across systems — from my laptop to a CI/CD server.

That realization drove the motivation behind this framework — an Airflow-based, Jenkins-triggered, Docker-deployed system designed to give me a ready-to-use experimentation environment.

Motivation

The idea wasn’t just to automate; it was to decouple experimentation from infrastructure. What I wanted instead was a repeatable and isolated environment where I could:

Spin up an ETL + training pipeline without touching the core code.
Experiment safely using parameters or configurations, not manual edits.
Automatically log, register, and version models via MLflow.
Keep the entire system portable and rebuildable — same behavior on any machine or CI runner.
Experimentation could move from “code edits” to config and parameter-based triggers.

Essentially, I wanted the luxury of iteration speed without the anxiety of setup.

That’s when I started designing an Airflow-based framework that could self-deploy via Jenkins, using Docker Compose for orchestration and Nexus for controlled dependency management. That led to building a self-contained Airflow framework, designed to:

Launch via Jenkins CI/CD with a single trigger.
Use Docker Compose to orchestrate an Airflow Celery cluster for distributed tasks.
Dynamically install custom Python dependencies from a private Nexus repo, without editing the Docker image manually.
Provide a ready-to-run sandbox for model training, ETL pipelines, and experiment tracking.

Implementation Overview

For those interested in the complete technical setup — including Dockerfile, Jenkins pipeline, and Docker Compose configurations — I’ve documented everything in detail here:
🔗 Airflow Celery Framework Documentation

Below, I’ll focus on how the system works internally — the practical workflow, integration points, and reasoning behind some of the implementation choices.

1. The Jenkins CI/CD Pipeline — The Automation Backbone

Jenkins is the central automation layer.
It eliminates the need for manual Docker builds or direct command-line work.
Instead, a user (or a scheduled trigger) starts a build with configurable parameters such as:

NEXUS_URL → private PyPI/Nexus repository URL
NEXUS_CREDS_ID → Jenkins credentials ID for Nexus authentication
DEV_DIR → target build directory for staging
REQUIREMENTS & CUSTOM_REQUIREMENTS → dependency lists to pull at runtime

Once triggered, the pipeline:

Cleans the workspace and checks out the repo.
Prepares the environment, creating the build directory and injecting Nexus credentials as Docker BuildKit secrets.
Fetches dependency files dynamically (curling them from URLs provided in the parameters).
Builds the Airflow image using those dependencies and secrets.
Runs docker compose up -d to bring up all Airflow services.
Cleans sensitive files (Nexus creds, .env, Dockerfile, compose YAML) to keep the environment safe.

This approach ensures every build is fresh, reproducible, and isolated, without needing to manually rebuild or edit Dockerfiles.

2. Docker Image Design — Parameterized and Secure

The Dockerfile extends the official Airflow image (apache/airflow:3.1.0) and is designed to be parameterized rather than static.

It introduces:

ARG INDEX_URL and ENV PYPI_URL for flexible dependency sources.
BuildKit secret mounts (nexus_user, nexus_pass) to inject credentials securely.
Multi-layer installs for clean separation between public and private dependencies.

This design means you can:

Swap dependency sets without modifying the Dockerfile.
Point to different Nexus repositories across environments (dev, staging, prod).
Rebuild instantly from Jenkins with zero code edits.

It’s a true “define once, reuse everywhere” model.

3. Dependency Management via Nexus

Instead of pushing all dependencies to PyPI or including them in the repo, private packages are hosted in Nexus.

Here’s how the flow works:

Jenkins reads NEXUS_CREDS_ID and exposes username/password as Docker secrets.
During the build, Docker mounts these credentials temporarily at /run/secrets.
Pip installs private dependencies using the provided INDEX_URL (from Nexus).
The credentials vanish after build completion — never written to image layers or logs.

This method is both secure and scalable, enabling enterprise-style dependency control with zero manual interference.

4. Execution Lifecycle Summary

Here’s what a full run looks like:

[1] Jenkins Job Triggered (manual or scheduled)
     ↓
[2] Parameters read → environment prepared
     ↓
[3] Nexus credentials injected securely
     ↓
[4] Docker build starts with secrets + dependency files
     ↓
[5] Custom Airflow image built dynamically
     ↓
[6] docker-compose up -d (Airflow + Redis + Postgres + Flower)
     ↓
[7] Secrets & temp files cleaned
     ↓
[8] Airflow UI accessible → ready for DAGs and experiments

5. Design Priorities

The system was built with three guiding principles:

Isolation — every environment is self-contained and disposable.
Reproducibility — build once, deploy anywhere, get the same behavior.
Security — credentials never persist, even in intermediate Docker layers.

These principles make it flexible enough for both individual experiments and team-scale deployments.

How I Used the Framework

After building the Airflow–Jenkins–Docker setup, I wanted to validate it with an actual end-to-end ML project. The goal was to see if this framework could handle real experimentation, versioning, and deployment workflows — not just spin up containers.

1. Building and Versioning the Project Package

I started locally with a project that handled ETL and model training, fully integrated with MLflow for experiment tracking.
Instead of running it as loose scripts, I packaged the entire project into a Python wheel (.whl) using setuptools.

To automate this:

I created a CI/CD pipeline in Jenkins dedicated to building, versioning, and publishing this wheel.
Every run of the pipeline created a new, versioned artifact (e.g., project_name-0.1.4-py3-none-any.whl).
The wheel, along with all its dependencies, was uploaded to my private Nexus repository, making it accessible like any other PyPI package.

For Airflow to access it during runtime, I exposed the Nexus repository securely using ngrok, which allowed local or private-network access from the containerized environment.

2. Integrating with the Airflow Deployment Framework

Once the package was available in Nexus, I used another CI/CD pipeline — the one based on my Airflow Celery framework — to automatically:

Pull the wheel from Nexus along with any additional dependencies,
Build the custom Airflow image through the Dockerfile that installs those dependencies dynamically, and
Bring up the entire Airflow environment via docker compose up -d.

With this, I now had a fully operational and reproducible Airflow setup — built, configured, and ready with just a few clicks.

3. Running Training and Deployment Pipelines

Next, I wrote two DAGs to test the framework’s integration capabilities.

Training DAG:
- Loads configuration files specifying hyperparameters.
- Runs ETL and training steps using the packaged project wheel.
- Tracks experiments, metrics, and models using MLflow.
- Results and artifacts (models, metrics, plots, etc.) appear automatically in the MLflow UI.
Deployment DAG:
- Fetches the required model artifact from a provided MLflow URI.
- Handles deployment logic — either pushing to cloud, an endpoint, or a designated inference environment.

These two DAGs validated the entire pipeline: from data processing → experiment tracking → artifact management → deployment handoff.

4. The Outcome

By combining these pipelines, I ended up with a completely automated, reproducible ML experimentation and deployment framework:

Jenkins handles build, versioning, and deployment triggers.
Docker + Airflow provide a consistent execution environment.
Nexus acts as the private dependency registry.
MLflow manages experiment tracking and model artifacts.

The best part?
It’s modular — I can plug in any ML project following the same structure and get a working Airflow environment with versioned dependencies and clean experiment tracking in minutes.

What’s Next

The next part of this series will focus on the actual project and deployment architecture — how the training pipeline was structured, how model promotion and validation were handled, and how deployment was automated in the cloud.

For now, this post covers the framework, setup, and environment that made all of that possible.

Machine Learning Experimentation with Airflow

Problem Statement

Motivation

Implementation Overview

1. The Jenkins CI/CD Pipeline — The Automation Backbone