Table of Contents

Intro

When building a Machine Learning model, you’re probably using some of the popular frameworks like TensorFlow/PyTorch/sklearn.

You run experiments, play with different models and architectures, fine-tune hyperparameters.

Once you’re happy with your results, you need to make that model run in your Production system. This gets into the field of MLOps that deals with productionizing (operationalizing) ML models.

There are many offerings by the top Cloud providers (Azure, AWS, GCP) that greatly simplify this task providing the tools to build awesome ML pipelines to manage your Production models at scale.

However, in this article, I’d like to describe another way to distribute your ML model in a portable format like a Nuget package.

Isn’t it cool to just install a Nuget package and call some C# method to run predictions via a model that an army of data scientists have been building for months?

That’s the main idea behind ONNX (Open Neural Network Exchange). It’s an open format that enables interoperability between different languages/platforms. In this article, I will use C# to load the pre-trained model, but there is support for many other programming languages/frameworks.

You can read more on the theory here, but I imagine you’d prefer to see a demo of how this works.

In the upcoming sections, you’ll explore the following:

Train a simple Neural Network.
Export it to ONNX format.
Preview the exported model in Netron.
Load it into a C# project and run predictions.

If that sounds exciting, let’s jump straight into it.

Model Training and Export to ONNX

I’ll build and train a simple neural network to recognize handwritten digits (between 0 and 9) using the classic mnist dataset.

Here’s the complete TensorFlow source code that you can also inspect in Google Colab here:

# Install and import required packages.
!pip install tensorflow==2.3.0
!pip install onnx==1.9.0
!pip install keras2onnx==1.7.0

import tensorflow as tf
import onnx
import keras2onnx

# Build, train and evaluate the model.
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10, activation=tf.keras.activations.softmax)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=5)

model.evaluate(x_test,  y_test, verbose=2)

# Output the model to ONNX format.
onnx_model = keras2onnx.convert_keras(model, model.name)
with open("mnist-model.onnx", "wb") as f:
    f.write(onnx_model.SerializeToString())

The model takes as input grayscale images with dimensions 28 X 28 pixels. Every pixel is represented by a single number between 0 and 255. So the overall input dimensionality is (n, 28, 28), where n is the number of the images.

For an input image, the model outputs ten probabilities, one for every number between 0 and 9. The number with the highest probability is the final prediction.

The exact architecture of the model is not essential for our purposes. It’s a standard NN with a 128-neuron hidden layer and a softmax activation.

Let’s stay focused on the ONNX part.

Pay attention to the pip install and import statements in the beginning. The onnx and keras2onnx modules are needed for the ONNX export.

I’ve deliberately set the exact versions of the libraries I’m so that you can easily replicate the example in your own environment.

The last part of the code snippet outputs a file with .onxx extension, which contains the model in an ONNX format.

Here’s the file ready for download in case you’d like to skip the training phase and directly load it in your C# app:

mnist-model Download

Before making some predictions in C#, let’s detour a little and see a nice way to visualize the ONNX model we just produced.

Previewing the Model in Netron

If you go to netron.app and upload our ONNX model, you’ll see something like this:

Looks neat, right?

Let’s take the hidden layer, for example. You see the matrix multiplication with the hidden layer weights (the MatMul operator), the addition of the bias (the Add operator), and the ReLu activation.

Clicking on one of the boxes will open a side panel with its’ properties, like so:

I think Netron is a great way to visualize your ONNX models, so I encourage you to give it a go.

Now, it’s time for the fun part!

Using the Model in C#

I’ll be testing with the following image (part of the mnist dataset):

I will give a piece-by-piece description later on, but I encourage you to have a look at the full source code below to get a sense of what’s going on:

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

namespace MnistOnnx
{
    class Program
    {
        static void Main(string[] args)
        {
            const string imagePath = @"mnist_test_eight.png";
            float[][] image = PreprocessTestImage(imagePath);
            
            const string modelPath = @"mnist-model.onnx";
            float[] probabilities = Predict(modelPath, image);

            // The predicted number is the index of the largest value(probability) in the array.
            int prediction = probabilities.ToList().IndexOf(probabilities.Max());
            
            Console.WriteLine($"Predicted number: {prediction}");
        }
        
        private static float[][] PreprocessTestImage(string path)
        {
            var img = new Bitmap(path);
            var result = new float[img.Width][];
            
            for (int i = 0; i < img.Width; i++)
            {
                result[i] = new float[img.Height];
                for (int j = 0; j < img.Height; j++)
                {
                    var pixel = img.GetPixel(i, j);
                    
                    var gray = RgbToGray(pixel);
                    
                    // Normalize the Gray value to 0-1 range
                    var normalized = gray / 255;
                    
                    result[i][j] = normalized;
                }
            }

            return result;
        }

        private static float RgbToGray(Color pixel) => 0.299f * pixel.R + 0.587f * pixel.G + 0.114f * pixel.B;

        private static float[] Predict(string modelPath, float[][] image)
        {
            using var session = new InferenceSession(modelPath);

            var modelInputLayerName = session.InputMetadata.Keys.Single();
            
            var imageFlattened = image.SelectMany(x => x).ToArray();
            int[] dimensions = {1, 28, 28};
            var inputTensor = new DenseTensor<float>(imageFlattened, dimensions);

            var modelInput = new List<NamedOnnxValue>
            {
                NamedOnnxValue.CreateFromTensor(modelInputLayerName, inputTensor)
            };
            
            var result = session.Run(modelInput);

            return ((DenseTensor<float>) result.Single().Value).ToArray();
        }
    }
}

What you’ve seen above is a .Net 5 Console App that outputs the number 8 as expected:

Predicted number: 8

To run the example on your own, you’ll need the following Nuget packages:

Microsoft.ML.OnnxRuntime 1.8.0
System.Drawing.Common – this is needed for the image manipulation bits, like extracting the pixel values, etc. It’s not related to the ML part.

You will also need the ONNX file itself and the testing image I’ve provided above.

Now, let’s take a closer look at the code.

Let’s start with the Main method:

static void Main(string[] args)
{
    const string imagePath = @"mnist_test_eight.png";
    float[][] image = PreprocessTestImage(imagePath);
    
    const string modelPath = @"mnist-model.onnx";
    float[] probabilities = Predict(modelPath, image);

    // The predicted number is the index of the largest value(probability) in the array.
    int prediction = probabilities.ToList().IndexOf(probabilities.Max());
    
    Console.WriteLine($"Predicted number: {prediction}");
}

It first invokes the logic to preprocess the image to a suitable format (explained below) and then calls the Predict method that returns an float array with ten elements (indexed 0 to 9). The index of the highest probability (max element in the array) is the predicted number, which is printed to the console.

The PreprocessTestImage method converts the image from RGB to a single grayscale number and normalizes the value to fit within the [0, 1] range:

private static float[][] PreprocessTestImage(string path)
{
    var img = new Bitmap(path);
    var result = new float[img.Width][];
    
    for (int i = 0; i < img.Width; i++)
    {
        result[i] = new float[img.Height];
        for (int j = 0; j < img.Height; j++)
        {
            var pixel = img.GetPixel(i, j);
            
            var gray = RgbToGray(pixel);
            
            // Normalize the Gray value to 0-1 range
            var normalized = gray / 255;
            
            result[i][j] = normalized;
        }
    }

    return result;
}

private static float RgbToGray(Color pixel) => 0.299f * pixel.R + 0.587f * pixel.G + 0.114f * pixel.B;

The RgbToGray method represents a standard formula for transforming RGB to Grayscale. You can read about it here.

Now, let’s discuss the Predict method:

private static float[] Predict(string modelPath, float[][] image)
{
    using var session = new InferenceSession(modelPath);

    var modelInputLayerName = session.InputMetadata.Keys.Single();
    
    var imageFlattened = image.SelectMany(x => x).ToArray();
    int[] dimensions = {1, 28, 28};
    var inputTensor = new DenseTensor<float>(imageFlattened, dimensions);

    var modelInput = new List<NamedOnnxValue>
    {
        NamedOnnxValue.CreateFromTensor(modelInputLayerName, inputTensor)
    };
    
    var result = session.Run(modelInput);

    return ((DenseTensor<float>) result.Single().Value).ToArray();
}

It’s best to experiment with the code on your own and review the data types but let me summarize from a high-level perspective:

Line 3 – load the model and prepare the InferenceSession object. This is the main object that deals with predictions (inference).
Line 5 to 14 – prepare the model input.
Line 16 – run the prediction.
Line 18 – extract the response and return the float array that contains the probability for each number between 0 and 9.

Summary

In this article, I presented how to train a model in TensorFlow, export it to ONNX format, load it and run predictions in C#.

I hope this was insightful!

Thanks for reading, and see you next time!

Making Predictions in C# with a Pre-Trained TensorFlow Model via ONNX

Intro

Model Training and Export to ONNX

Previewing the Model in Netron

Using the Model in C#

Summary

Resources

Vasil Kosturski

Exploring the async/await State Machine – Stack Traces and Refactoring Pitfalls

Predicting Football Clubs Winning Percentage in the English Premier League Using Pythagorean Expectation

Related Posts:

Clash of Styles, Part #3 – Extensibility via OOP and FP

Clash of Styles, Part #1 – Operations Matrix via OOP

Course Review – “Machine Learning” by Andrew Ng, Stanford on Coursera

Subscribe To My Newsletter

Making Predictions in C# with a Pre-Trained TensorFlow Model via ONNX

Intro

Model Training and Export to ONNX

Previewing the Model in Netron

Using the Model in C#

Summary

Resources

Share this:

Vasil Kosturski

Post Navigation

Exploring the async/await State Machine – Stack Traces and Refactoring Pitfalls

Predicting Football Clubs Winning Percentage in the English Premier League Using Pythagorean Expectation

Related Posts:

Clash of Styles, Part #3 – Extensibility via OOP and FP

Clash of Styles, Part #1 – Operations Matrix via OOP

Course Review – “Machine Learning” by Andrew Ng, Stanford on Coursera

Subscribe To My Newsletter