Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / machine-learning

Building Own Computer Vision Cloud Service

4.85/5 (10 votes)
17 Dec 2017CPOL10 min read 20.1K  
Build a simple machine learning service with ASP.NET Core, Tensorflow and Azure Cloud
This article doesn't attempt to explain how computer vision algorithm works, but instead, it covers how a developer can build and deploy own intelligent services using modern technologies. In this article, you will learn about building a simple machine learning service using ASP.NET Core, Tensorflow and Azure Cloud.

Table of Contents

Introduction

The top cloud platform vendors like Amazon, Google and Microsoft provide powerful and simple to use REST API for processing images, video analysis, speech recognition and other advanced algorithms. They allow developers to make their applications more intelligent without deep knowledge of the data science.

However, those APIs are intended for general purpose and can't be customized for own needs, so I wondered how to build own, simple computer vision service that will be still publicly available, but gives me more flexibility and control over how it works.

This article doesn't attempt to explain how computer vision algorithm works, but instead, it covers how a developer can build and deploy own intelligent services using modern technologies.

What the Computer Vision Service Does

The idea of computer vision service is quite simple: it's a publicly available REST service that receives a picture and can localize and identify one or more objects on it.

Further, we will look deeply into how the service works and how to train underlying machine learning model. Also, you can publish the computer vision service on your Azure account by clicking the following button:

Image 1

How Does it Work?

Let's briefly discuss the stack of technologies we need and how they will be used.

The heart of the computer vision service is an object detection TensorFlow model represented by the frozen_inference_graph.pb file. That file includes a graph definitions and metadata of the model. Graph definitions represent your computation in terms of the dependencies between individual operations. Metadata contains the information required to continue training, perform evaluation, or run inference on a previously trained graph. The object detection graph has "image_tensor" input tensor - any image on which objects will be detected, and four output tensors: detection_boxes - borders of detected objects, detection_scores - correctness confidence in percentages, detection_classes - classes of detected objects (cars, people, animal, etc.), num_detections - the number of valid boxes per image in the batch. Further, the object detection graph can be loaded into TensorFlow framework (this is detailed covered in "Choose machine learning model" section). Once TensorFlow is set up and model is trained, we should be able to run execution graph and process input images.

I want to expose computer vision as a REST service and create it with my famous programming language C#, so I've chosen ASP.NET Core 2.0 as a web framework, so I can run TensorFlow and ASP.NET Core service on Linux. It's very important because many TensorFlow models are better documentated for Linux.

Now, I need to think about a bridge between managed .NET code and unmanaged Tensorflow API to get the service working. For this purpose, I will use TensorFlowSharp library - managed TensorFlow API for C# (see "Using Tensorflow Model from C#" section). After compiling TensorFlowSharp for .NET Standard 2.0, I will be able to use in ASP.NET Core 2.0 on Linux.

So finally, I will publish the computer vision service on Azure Linux VM making the REST API publicly available.

Computer Vision REST Service

Choosing Tensorflow Model

As has been said before, the core of the service is TensorFlowmodel trained specially for detecting objects on images. I will use the object detection model from TensorFlowresearch. I will train the model on custom datasets further in the article, but you can use one of pre-trained models from tensorflow detection model zoo as well.

Using TensorFlow Model from C#

The following code uses TensorFlowSharp binding to import the model into TensorFlow and detect objects on the image:

C#
public static void Main<T> (string [] args, ILogger<T> logger)
{
      options.Parse (args);

      if (_catalogPath == null) {
          _catalogPath = DownloadDefaultTexts (_currentDir);
      }

      if (_modelPath == null) {
          _modelPath = DownloadDefaultModel (_currentDir);
      }

      _catalog = CatalogUtil.ReadCatalogItems (_catalogPath);
      var fileTuples = new List<(string input, string output)> ()
                       { (_input, _output) };
      string modelFile = _modelPath;

      using (var graph = new TFGraph ())
      {
            // imports model from the disk into tensorflow framework
            var model = File.ReadAllBytes (modelFile);
            graph.Import (new TFBuffer (model));

            using (var session = new TFSession (graph)) {
            foreach (var tuple in fileTuples) {
                  // converts input image to a tensor
                  var tensor = ImageUtil.CreateTensorFromImageFile
                               (tuple.input, TFDataType.UInt8);
                  var runner = session.GetRunner ();

                  // specifies input and output tensors
                  runner
                        .AddInput (graph ["image_tensor"] [0], tensor)
                        .Fetch (graph ["detection_boxes"] [0],
                              graph ["detection_scores"] [0],
                              graph ["detection_classes"] [0],
                              graph ["num_detections"] [0]);

                  var output = runner.Run ();

                  // fetches graph execution results.
                  // They can be used for highlighting
                  // detected objects on the picture
                  var boxes = (float [,,])output [0].GetValue (jagged: false);
                  var scores = (float [,])output [1].GetValue (jagged: false);
                  var classes = (float [,])output [2].GetValue (jagged: false);
                  var num = (float [])output [3].GetValue (jagged: false);

                  DrawBoxes (boxes,
                             scores,
                             classes,
                             tuple.input,
                             tuple.output,
                             MIN_SCORE_FOR_OBJECT_HIGHLIGHTING);
            }
            }
      }
}

Exposing REST API

Here's how ASP.NET Core 2.0 controller looks like:

C#
[Route("api/[controller]")]
public class ObjectDetectionController : Controller
{
      private const string TrainedModelFileName = "frozen_inference_graph.pb";
      private const string CatalogFileName = "mscoco_label_map.pbtxt";

      private ILogger<objectdetectioncontroller> _logger;
      private readonly IHostingEnvironment _hostingEnvironment;

      public ObjectDetectionController(ILogger<objectdetectioncontroller> logger,
                                       IHostingEnvironment hostingEnvironment)
      {
            _logger = logger;
            _hostingEnvironment = hostingEnvironment;
      }

      [HttpGet("{id}")]
      public IActionResult GetProcessedImageById(string id)
      {
            if (id == null) throw new ArgumentNullException(nameof(id));
            var image = System.IO.File.OpenRead($"test_images/{id}_detected.jpg");
            return File(image, "image/jpeg");
      }

      [HttpPost]
      public (string, string) DetectObjects([FromBody]string imageAsString)
      {
            if (imageAsString == null)
                throw new ArgumentNullException(nameof(imageAsString));

            // generates input and processed file names
            string id = Guid.NewGuid().ToString("N");
            string inputImage = GetSafeFilename($"{id}.jpg");
            string outputImage = GetSafeFilename($"{id}_detected.jpg");

            string inputImagePath = Path.Combine
            (_hostingEnvironment.ContentRootPath, "test_images", inputImage);
            string outputImagePath = Path.Combine
            (_hostingEnvironment.ContentRootPath, "test_images", outputImage);

            // saves input image on the disk
            SaveImage(imageAsString, inputImagePath);

            // runs TensorFlow and detect objects on the image
            ExampleObjectDetection.Program.Main(new string[] {
                $@"--input_image={inputImagePath}",
                $@"--output_image={outputImagePath}" ,
                $@"--catalog={Path.Combine
                (_hostingEnvironment.ContentRootPath,CatalogFileName)}" ,
                $@"--model={Path.Combine
                (_hostingEnvironment.ContentRootPath,TrainedModelFileName)}" ,
            },
                _logger);

            // returns processed image and preview url
            string detectedImage = Convert.ToBase64String
            (System.IO.File.ReadAllBytes(outputImagePath));
            var previewUrl = UriHelper.BuildAbsolute
            (Request.Scheme, Request.Host, path:
             new Microsoft.AspNetCore.Http.PathString
             ($"/api/objectdetection/{id}"));
            return (previewUrl, detectedImage);
      }

      private static void SaveImage(string imgStr, string imgPath)
      {
            byte[] imageBytes = Convert.FromBase64String(imgStr);
            System.IO.File.WriteAllBytes(imgPath, imageBytes);
      }

      private string GetSafeFilename(string filename)
      {
            return string.Join("_", filename.Split
                   (Path.GetInvalidFileNameChars()));
      }
}

The controller exposes two REST methods:

  • DetectObjects() and
  • GetProcessedImageById()

DetectObjects() API accepts base64 encoded image and returns URL to an image with detected objects on it. GetProcessedImageById() allows previewing of processed image.

To the Cloud

On this stage, I'm able to run the service on my local machine and now time to publish it on Azure. I've prepared ARM Template with a couple of scripts which:

  • deploy Linux VM
  • install TensorFlow and .NET Core
  • pull source code from GitHub, build, and exposes REST API using nginx and ASP.NET Core

Image 2

Once VM is created, it becomes available either by RDP or SSH (see How to use SSH on Linux VM). Connect and check that the service is deployed successfully and 'objectdetection' directory contains the following files:

Image 3

Let's Try It Out

The computer vision service is published and I want to try it out by sending some pictures to process. First, I encode the picture to base64 string, it could be done by a couple of lines of C# code, but even simply by using some online service, for instance, https://www.base64-image.de.

Further, sending HTTP request to the computer vision service using Postman (or any other utility). The HTTP request should include following headers:

Accept:text/plain
Content-Type:text/plain

and base64-encoded image in the body.

Image 4

In response, I receive URL to the processed image and base64-encoded processed image. By clicking on the URL, I can load it:

Image 5

Troubleshooting

Check out logs for troubleshooting. They are located in /var/log/dotnettest.err.log and /var/log/dotnettest.out.log files.

Training Your Own Model

In this section, we will talk how to train object detection model.

Deploy VM for Training

In order to train an object detection model, we need to deploy a special virtual machine. The training requires intensive calculation which can take hours. To speed up the training process, we will use GPU-based VMs in Azure (see the list of available sizes here) and parallelize computation on few GPU units. I've prepared an ARM template for deployment object detection training VM.

STEP 1 - Deploy GPU VM on Azure

Click the "Deploy" and fill the form to deploy training Linux VM on your Azure account:

Image 6

Image 7

STEP 2 - Download CUDA and cuDNN

CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs).

Ok, so what is cuDNN? The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks.

It will require the registration on NVIDIA web site:

Image 8

STEP 3 - Upload CUDA and CUDNN on Training VM

You need to execute the following command from Windows command line to upload CUDA and cuDNN installation files from your local machine (I assume they are located in "Downloads" folder) to training VM:

C#
pscp %UserProfile%\Downloads\libcudnn5_5.1.10-1+cuda8.0_amd64.deb 
testadmin@52.174.127.204:/home/testadmin/
pscp %UserProfile%\Downloads\cudnn-8.0-linux-x64-v6.0.solitairetheme8 
testadmin@52.174.127.204:/home/testadmin/

STEP 4 - Install NVIDIA Driver

Now, we can install NVIDIA driver. I've prepared "install_driver.sh" shell script which is uploaded on training VM. You need to connect via SSH or RDP to your training VM and launch installation using the following command:

C#
sh install_driver.sh

Be patient, it will take a time. Your VM will be rebooted when the driver is installed, so you need to connect via SSH or RDP again.

STEP 5 - Install CUDA and cuDNN

After reboot, we need to install CUDA and cuDNN which were uploaded in earlier. For that purpose, execute the following script:

C#
sudo dpkg -i libcudnn5_5.1.10-1+cuda8.0_amd64.deb

wget https://developer.download.nvidia.com/compute/cuda/
     repos/ubuntu1604/x86_64/cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1604_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda-8-0

tar -zxvf cudnn-8.0-linux-x64-v6.0.solitairetheme8
cd cuda
sudo cp include/cudnn.h /usr/local/cuda/include
sudo cp lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

sudo apt-get -y install python3-tk

Initialize environment variables:

C#
vim ~/.bashrc

This command opens vim editor where you need to scroll down to the very bottom and insert following lines:

C#
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$CUDA_HOME/lib"
export PATH="$CUDA_HOME/bin:$PATH"

Your command line should look like this:

Image 9

press "Esc" button and type ":wq" - this will write changes to the .bashrc file and quit from vim. Reload .bashrc:

C#
source ~/.bashrc

STEP 6 - Check You are Success

Once all steps are done, you can check that your machine is ready for training the neural network. Put the following command in your command line:

C#
nvcc --version
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

this prints installed CUDA and cuDNN versions. Ensure you have CUDA v.8.0 and cuDNN v.6 installed on your training machine like on this picture:

Image 10

STEP 7 - Start Training Object Detection

On this step, we will train our object detection neural network. I'm suggesting to connect to training Ubuntu VM via RDP because you'll need to launch three terminals: one launches TensorBoard, one - training and one - evaluation processes. Below, we will talk about each of them.

First of all, let's launch a TensorBoard - powerful tool which comes together with TensorFlow and visualizes TensorFlow graph and learning process. The following command starts TensorBoard on your training VM:

C#
sh start_tensorboard.sh

Start training process:

C#
. export_variables.sh
. train.sh

and finally - evaluation:

C#
. export_variables.sh
. eval.sh

You should be able to see a picture like this (left - TensorBoard, middle - training and right - evaluation process):

Image 11

TensorBoard

TensorBoard is powerful tools helping understand, optimize and debug training a neural network.
TensorBoard:

  • visualizes neural network graph. I used it for finding which places can be parallelized and for general understanding underlying neural network structure
  • visualizes how a distribution of some tensor changes over the time by showing many histograms visualizations of your tensor at different points in time
  • visualizes learning. It's helpful when you want to understand when you need to stop training process (we will see it in next section)

Image 12

Training

Training process requires training data - set of images to pass through machine learning algorithm. The training data must contain correct answer so that the training process could compare the actual result of the algorithm to that you want to predict. After few number of steps, the training process creates a checkpoint, because it can take a long time and checkpoint allows resuming training from the last checkpoint in case if the process fails.

It's useful to watch TotalLoss chart on TensorBoard to keep track how the training process is going. Than closer TotalLoss value to zero than a better prediction of your model.

I ran 16k steps of training and got following TotalLoss chart. As you can notice, I had pretty good prediction result after ~2k steps of training and subsequent iterations didn't change prediction dramatically:

Image 13

Evaluation

In parallel with the training process, you can run evaluation process. The evaluation can be performed from time to time, for instance, every 500 steps, for test and validation purposes.

Here is the result of evaluation of bus picture. You can see how object detection accuracy was changing during the training:

Image 14

Conclusion

Let's summarize what we've learned from this article.

  • This article demonstrates how to deploy and customize own computer vision service, but described steps can be used for building any other intelligent services based on TensorFlow models. The full list of available TensorFlow models is here.
  • .NET developers can use TensorFlow on Linux via TensorFlowSharp bindings which wrap up native TensorFlow API.
  • If you're exposing intensive TensorFlow operations as ASP.NET Core REST API, consider increasing default timeouts on your reverse proxy. I used Nginx as a reverse proxy and here you can find used configuration.
  • Consider using different hardware configurations for training and evaluation of TensorFlow models. Machines with powerful GPU cope very good with graphical tasks and image processing, so they allow you to train a model very fast, but they are costly. Once you've trained your model, you can use it on cheaper CPU-based machine or cluster of machines.
  • You can skip model training and download pre-trained models for building PoC project or demo. Also, you can use pre-trained model checkpoints as a start point of training against own datasets, it will save you a time.
  • Training of TensorFlow models is a compute-intensive and time-consuming operation, so consider running it on cloud and parallelize on different machines or units. I ran training on NC12 VM on Azure Cloud powered by the NVIDIA Tesla K80 card with 2 GPUs and parallelize training and evaluation processes on different GPUs.
  • Using Azure Resource Manager, I was able to automate most of deployment VMs operations for training TensorFlow model and hosting REST service. But upgrading of video card driver and installing CUDA should be done manually via SSH or RDP still.
  • TensorBoard is a powerful tool to visualize, optimize and debug TensorFlow training process.

Enjoy making your apps more intelligent!

References

History

  • 17th December, 2017: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)