Machine learning pipeline: Model deployment

Deployment is not a pure DevOps task, it requires some knowledge of the model architecture and it’s hardware requirement.
Currently there are 3 ways to deploy a machine elarning model:

  • Model Server: The most common way, client requests a prediction by submitting input data to model server
  • User browser: When the input data is sensitive or you don’t want to submit the input, we can deploy the ML model to the user’s browser
  • Edge devices: Some situation don’t allow us to connect to a model server: remote sensors or IOT, the number of this type of deployment is increasing

A simple model server

A simple deployment usually follow the same path:

  • Create a web app with python (Flask or Django)
  • Create an API endpoint in the web APP
  • Load the model structure and it’s weights
  • Call the predict method on the loaded model
  • Return the prediction results as an HTTP request
import json
from flask import Flask
from keras.models import load_model
from utils import preprocess 1

model = load_model('model.h5') 2
app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def classify():
    input = request.form['input']
    preprocessed_input = preprocess(input)
    prediction = model.predict_classes([preprocessed_input])[0]
    return json.dumps({"score": int(prediction)}) 
Why isn’t it recommended to use it this way:
  • Separation of concerns: Don’t mix the api code with the machine learning model
  • Lack of version control: if adding a new model a new endpoint is required
  • Batch inference: a model server can gather request (through time or by quantity and send them to the GPU)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s