TensorFlow Serving gRPC Endpoint in Docker with a .NET 5 Client

Intro

In this article, you will learn how to:

  1. Deploy and serve a TensorFlow 2 model via TensorFlow Serving in a Docker container.
  2. Expose the serving endpoint using gRPC.
  3. Copy the required `.proto` files in the .NET client app and generate the gRPC stub classes.
  4. Run inferences from the .NET client app.

You can find the complete source code with detailed setup instructions in GitHub.

Why this Article?

A while ago, I presented how to consume an ML model from a .NET app via ONNX.

A more popular approach is to expose the model via some public endpoint using RESTful or gRPC-based API. TensorFlow Serving supports both of these mechanisms.

Most of the developers are already familiar with RESTful APIs. Furthermore, gRPC is gaining popularity, especially for interservice calls, so I decided to use it as a communication protocol for this tutorial.

There are already some great sources in Coursera and in GitHub. Although those are good references, the first one is built solely in Python, while the second one hides some of the gRPC setup like compiling the .proto files.

So I decided to create an easy-to-run step-by-step guide with clean and reproducible steps.

Bear in mind that the goal here is not to become an expert in any of the related areas like Docker, Tensorflow Serving, or gRPC but instead to give you a quick practical overview of how to make them work together.

Let’s first go through a few high-level definitions.

TensorFlow Serving

In short, TensorFlow Serving simplifies the process of publishing and serving your trained models:

“TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs. TensorFlow Serving provides out of the box integration with TensorFlow models but can be easily extended to serve other types of models.” Source

gRPC and Protocol Buffers

“gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.”Source

Another topic that often goes hands in hands with gRPC is Protocol Buffers:

“Protocol buffers are Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.”Source

“With protocol buffers, you write a .proto description of the data structure you wish to store. From that, the protocol buffer compiler creates a class that implements automatic encoding and parsing of the protocol buffer data with an efficient binary format. The generated class provides getters and setters for the fields that make up a protocol buffer and takes care of the details of reading and writing the protocol buffer as a unit. Importantly, the protocol buffer format supports the idea of extending the format over time in such a way that the code can still read data encoded with the old format.”Source

In this article, you will get some hands-on experience with gRPC and Protocol Buffers, but if you’d like to dig deeper into the topic, I’d recommend the following Pluralsight course.

The Model

For the purposes of this post, the exact ML model architecture is not that important. That’s why I’ve used the simple MNIST model from my ONNX article that’s built with a densely connected Neural Network. You can explore the Jupyter Notebook here.

Host the Model in Docker Over gRPC

Serving the model via gRPC with TF Serving in Docker is relatively straightforward, and the deployment script is quite simple. I’ll explain some of the main points below.

If you’re new to Docker, there are plenty of online courses you can take like this one.

First, make sure you have Docker installed on your machine.

Then, you need to pull the Tensorflow Serving Docker image:

docker pull tensorflow/serving:2.6.0

Note that I’m using a specific version, 2.6.0, instead of just latest which is generally a good idea, so you know your program won’t fall apart when a new build comes out.

It’s time to run the container:

docker run -t --rm --name=tf-serving-mnist -p 8500:8500 -v "%cd%/mnist-model:/models/model" tensorflow/serving --model_name=mnist-model

This script starts a Docker container called tf-serving-mnist, sets the model’s name to mnist-model and exposes the default gRPC port(8500) to be accessible by the host environment.

We also point out the directory containing the model – the mnist-model subdirectory.

Once you have the container running, you should see something like this:

.NET Client

For consuming the gRPC endpoint, I’ve built a simple .NET 5 console app.

Pretty much all of the code is in the Program.cs file.

Let’s take a look at some of the main pieces.

The .proto Files

In order to consume the Tensorflow Serving gRPC API we just deployed, we need to include the required .proto files in the .NET project:

I’ve taken the .proto files from the TensorFlow GitHub repo. For example, here’s a link to tensor.proto.

As an example, let’s inspect prediction_service.proto.

syntax = "proto3";

package tensorflow.serving;
option cc_enable_arenas = true;

import "predict.proto";

// PredictionService provides access to machine-learned models loaded by
// model_servers.
service PredictionService {
  // Predict -- provides access to loaded TensorFlow model.
  rpc Predict(PredictRequest) returns (PredictResponse);
}

This service definition declares that the PredictionService exposes a Predict method with an input of type PredictRequest and response of type PredictResponse.

The PredictRequest and PredictResponse types are defined in the predict.proto file (see the import statement), which in turn looks like so:

syntax = "proto3";

package tensorflow.serving;
option cc_enable_arenas = true;

import "tensor.proto";
import "model.proto";

// PredictRequest specifies which TensorFlow model to run, as well as
// how inputs are mapped to tensors and how outputs are filtered before
// returning to user.
message PredictRequest {
  // Model Specification.
  ModelSpec model_spec = 1;

  // Input tensors.
  // Names of input tensor are alias names. The mapping from aliases to real
  // input tensor names is expected to be stored as named generic signature
  // under the key “inputs” in the model export.
  // Each alias listed in a generic signature named “inputs” should be provided
  // exactly once in order to run the prediction.
  map<string, TensorProto> inputs = 2;

  // Output filter.
  // Names specified are alias names. The mapping from aliases to real output
  // tensor names is expected to be stored as named generic signature under
  // the key “outputs” in the model export.
  // Only tensors specified here will be run/fetched and returned, with the
  // exception that when none is specified, all tensors specified in the
  // named signature will be run/fetched and returned.
  repeated string output_filter = 3;
}

// Response for PredictRequest on successful run.
message PredictResponse {
  // Output tensors.
  map<string, TensorProto> outputs = 1;
}

This .proto file also contains some further imports for its’ own dependencies.

Compilation to .NET Classes

So, how do the .proto files get translated/compiled to C# classes? That’s the job of the protocol buffer compiler.

All we need to do is add a reference to the Protos dir in the project file:

Once you build the project, you’ll see the generated stub classes under the obj directory:

These classes provide a convenient API to invoke the prediction service and hide all the implementation complexities of the gRPC communication channel.

The Client Code

Let’s quickly take a look at the C# client:

static async Task Main(string[] args)
{
    const string imagePath = @"mnist_test_eight.png";
    var input = PreprocessTestImage(imagePath); // #A

    var request = new PredictRequest
    {
        ModelSpec = new ModelSpec { Name = "mnist-model" }
    };
    
    request.Inputs.Add("flatten_input", input);
    
    var channel = new Channel("localhost:8500", ChannelCredentials.Insecure); // #B
    var client = new PredictionService.PredictionServiceClient(channel);
    
    var predictResponse = await client.PredictAsync(request); // #C
    
    var maxValue = predictResponse.Outputs["dense_1"].FloatVal.Max();
    var predictedValue = predictResponse.Outputs["dense_1"].FloatVal.IndexOf(maxValue);

    Console.WriteLine($"Predicted value: {predictedValue}"); // #D
}

I won’t get into too many details as I believe the code is simple enough to inspect on your own, but here are a few details (see the code comments as references to the points below).

#A – We call the PreprocessTestImage method that has some logic to convert the input image to a format suitable for the prediction service to consume. Concretely, it converts the image to grayscale and creates a TensorProto object.

#B – We create the gRPC channel that will be used to send messages to the gRPC endpoint.

#C – We call the prediction endpoint.

#D – The prediction response contains a probability for every number from 0 to 9. We take the max probability, store the corresponding digit in the predictedValue  variable and print it to the Console.

After running the program, you should see a result like this:

Predicted value: 8

Summary

In this article, you learned how to host a model via TensorFlow Serving in Docker over gRPC and create a .NET 5 client for it.

I hope that was helpful. See you next time!

Resources

  1. TensorFlow Serving with Docker for Model Deployment
  2. Tensorflow 2.0 serving .Net Core 5 React/Redux client (GRPC/Rest)
  3. Docker Deep Dive

Site Footer

Subscribe To My Newsletter

Email address