Ship Your ML Model From Local To Production

May 20, 2026

You’ve trained a model. Accuracy looks good. You save it. And then… nothing.

The notebook sits there. Nobody else can use it. The model never leaves your laptop.

Under 50 lines of Python. A trained scikit-learn model, a FastAPI server, a Dockerfile, a Python client, and a requirements file. That’s the whole thing.

What’s inside the box

Clone it and you get a directory that looks like this:

The model is a pre-trained classifier on the classic Iris dataset, saved with joblib. The server is a FastAPI app that loads the model at startup and exposes a single POST /predict endpoint. The Dockerfile packages it all into a portable, reproducible container. Let’s go layer by layer.

The system at a glance

server.py - deceptively simple

Here is the entire server, annotated:

from fastapi import FastAPI
import joblib
import numpy as np

# Model loaded ONCE at import time — no per-request overhead
model = joblib.load(’app/model.joblib’)

# Map integer predictions back to human-readable labels
class_names = np.array([’setosa’, ‘versicolor’, ‘virginica’])

app = FastAPI()

@app.get(’/’)
def read_root():
    return {’message’: ‘Iris model API’}

@app.post(’/predict’)
def predict(data: dict):
    “”“
    Accepts: {”features”: [5.1, 3.5, 1.4, 0.2]}
    Returns: {”predicted_class”: “setosa”}
    “”“
    features = np.array(data[’features’]).reshape(1, -1)
    prediction = model.predict(features)
    class_name = class_names[prediction][0]
    return {’predicted_class’: class_name}

Three patterns worth noting here:

1. Global model loading

The joblib.load() call happens at module import time — not inside the endpoint function. This means deserialization cost is paid once when the container starts, not on every request. For larger models, this is critical.

2. Untyped `dict` body

The request body is typed as a plain dict. It works, but a Pydantic model would give you automatic validation and a nicer OpenAPI schema — a natural next step when adapting this template.

3. Auto-generated docs

Because FastAPI is used, you get a Swagger UI at /docs for free. Zero extra code needed — just open the browser and test your endpoint interactively.

Dockerfile - what each line actually does

# Start from official Python 3.11 base — slim might be better for prod
FROM python:3.11

# All commands run from /code inside the container
WORKDIR /code

# Copy deps first — Docker caches this layer if requirements.txt unchanged
COPY ./requirements.txt /code/requirements.txt
RUN pip install --no-cache-dir -r /code/requirements.txt

# Copy the app directory (model + server)
COPY ./app /code/app

# Document the port (doesn’t actually publish it — that’s -p at runtime)
EXPOSE 8000

# Start Uvicorn serving the FastAPI app on all interfaces
CMD [”uvicorn”, “app.server:app”, “--host”, “0.0.0.0”, “--port”, “8000”]

Three commands from zero to inference

1. Build the image

Docker reads the Dockerfile, pulls the Python base, installs deps, and bakes the model in.

docker build -t iris-api

2. Run the container

Map your host port 8000 to the container’s 8000. Uvicorn starts automatically.

docker run --name iris-container -p 8000:8000 iris-api

3. Hit the endpoint

Send four Iris features (sepal length, sepal width, petal length, petal width). Get a class back.

curl -X POST “http://0.0.0.0:8000/predict” \
  -H “Content-Type: application/json” \
  -d ‘{”features”: [5.1, 3.5, 1.4, 0.2]}’

# Response:
{”predicted_class”: “setosa”}

The Python client - because curl gets old fast

The repo includes client.py - a script that loops over a batch of 12 flower measurements and collects predictions:

import json, requests

data = [[4.3, 3.0, 1.1, 0.1], [5.8, 4.0, 1.2, 0.2], ...]
url  = ‘http://0.0.0.0:8000/predict/’

predictions = []
for record in data:
    resp = requests.post(url, data=json.dumps({’features’: record}))
    predictions.append(resp.json()[’predicted_class’])

print(predictions)
# [’setosa’, ‘setosa’, ‘setosa’, ...]

Why this template matters

The ML community has a deployment gap problem. Models get trained, celebrated, and shelved. This repo offers a concrete, working pattern that you can clone, swap in your own model.joblib, and have a running API in minutes.

The technology choices (FastAPI over Flask, Uvicorn, joblib, Docker) are all industry-standard. The file count is ruthlessly small. There’s nothing to get lost in.

If you’re an ML practitioner who has never deployed a model, this is your entry point. If you’re an experienced engineer, it’s a clean boilerplate to modify rather than build from scratch.

Access the whole GitHub repo here:

https://github.com/DanilZherebtsov/ml-docker-flask-api.git

Share Growtechie

Join Growtechie’s subscriber chat

Available in the Substack app and on web

Growtechie

Discussion about this post

Ready for more?