Image recognition, translation and speech synthesis - 3in1 Web API

Jalle

5.00/5 (6 votes)

4 Dec 2016CPOL2 min read

20.2K

WebAPI, Azure, Android

Introduction

For this tutorial I will demonstrate how to use powerful Google API's for making some useful applications. This tutorial is divided into two parts:
Part 1. Building WebAPI service that handles image labeling and translation into different languages.
Part 2. Consuming this RESTful service from Android application.

Using the code

We will start by creating new WebAPI project. Start Visual Studio choose New project -> C# ->Web ->ASP.NET Web Application -Empty. Check WebAPI, and host in the cloud to be able to publish this project later.

packages.config will contain all the libraries we need for this project.

XML

    <?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="BouncyCastle" version="1.7.0" targetFramework="net45" />
  <package id="Google.Apis" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Auth" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Core" version="1.19.0" targetFramework="net45" />
  <package id="Google.Apis.Translate.v2" version="1.19.0.543" targetFramework="net45" />
  <package id="Google.Apis.Vision.v1" version="1.19.0.683" targetFramework="net45" />
  <package id="GoogleApi" version="2.0.13" targetFramework="net45" />
  <package id="log4net" version="2.0.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.Client" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.Core" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.AspNet.WebApi.WebHost" version="5.2.3" targetFramework="net45" />
  <package id="Microsoft.CodeDom.Providers.DotNetCompilerPlatform" version="1.0.0" targetFramework="net45" />
  <package id="Microsoft.Net.Compilers" version="1.0.0" targetFramework="net45" developmentDependency="true" />
  <package id="Newtonsoft.Json" version="7.0.1" targetFramework="net45" />
  <package id="Zlib.Portable.Signed" version="1.11.0" targetFramework="net45" />
</packages>

Setting up Api Keys

Since we will be using Google APIs we need to set up a google cloud vision api project first.

1. For Google Vision API download VisionAPI-xxxxxx.json file and save it in your project root directory
2. For Translation API get the API key from same page

Back in the code we will first invoke those API variables. Replace values with they keys acquired above.

 using System;
using System.Configuration;
using System.Diagnostics;
using System.IO;
using System.Web.Http;

namespace ThingTranslatorAPI2 {
  public class Global : System.Web.HttpApplication {

    public static String apiKey;
    protected void Application_Start() {
      GlobalConfiguration.Configure(WebApiConfig.Register);

      apiKey = "API-KEY";

      createEnvVar();
    }

    private static void createEnvVar() {
      var GAC = Environment.GetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS");

        if (GAC == null) {
        var VisionApiKey = ConfigurationManager.AppSettings["VisionApiKey"]; 
        if (VisionApiKey != null) {
          var path = System.Web.Hosting.HostingEnvironment.MapPath("~/") + "YOUR-API-KEY.json";

          Trace.TraceError("path: " + path);

          File.WriteAllText(path,VisionApiKey );
          Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", path);
        }
      }
    }
  }
}

WebApiConfig located in Ap_Start folder will contain this. We tell the server to handle routes using attribut routing and not by the default router config.

using System.Web.Http;
namespace ThingTranslatorAPI2
{
    public static class WebApiConfig
    {
    public static void Register(HttpConfiguration config)
        {
            // Web API routes
            config.MapHttpAttributeRoutes();
        }
    }
}

API Controller

We need a API controller that will handle requests and process them. Request should contain image file and language code for the language we want translation to be made. Images will be processed in memory so no need to save it on the disc.

TranslatorController.cs

using System;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Threading.Tasks;
using System.Web.Http;
using System.Web.Http.Results;
using GoogleApi;
using GoogleApi.Entities.Translate.Translate.Request;
using TranslationsResource = Google.Apis.Translate.v2.Data.TranslationsResource;

namespace ThingTranslatorAPI2.Controllers {
   
  [RoutePrefix("api")]
  public class TranslatorController : ApiController
  {

    [Route("upload")]
    [HttpPost]
    public async Task<jsonresult<response>> Upload() {
      if (!Request.Content.IsMimeMultipartContent())
        throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);

      String langCode = string.Empty;
      var response = new Response();
      byte[] buffer = null;

      var provider = new MultipartMemoryStreamProvider();
      await Request.Content.ReadAsMultipartAsync(provider);

      foreach (var content in provider.Contents)
      {
        if (content.Headers.ContentType !=null && content.Headers.ContentType.MediaType.Contains("image"))
           buffer = await content.ReadAsByteArrayAsync();
        else
          langCode = await content.ReadAsStringAsync();
      }

      var labels = LabelDetectior.GetLabels(buffer);
      
      try {
        //Take the first label  that has the best match
        var bestMatch = labels[0].LabelAnnotations.FirstOrDefault()?.Description;
        String translateText;
        if (langCode == "en")
          translateText = bestMatch;
        else
          translateText = TranslateText(bestMatch, "en", langCode);

        //original is our text in English
        response.Original = bestMatch;
        response.Translation = translateText;

      } catch (Exception ex) {
        response.Error = ex.Message;
        return Json(response);
      }

      return Json(response);
    }

   //Translate text from source to target language
    private String TranslateText(String text, String source, String target) {

      var _request = new TranslateRequest {
        Source = source,
        Target = target,
        Qs = new[] { text },
        Key = Global.apiKey
      };

      try {
        var _result = GoogleTranslate.Translate.Query(_request);
        return _result.Data.Translations.First().TranslatedText;
      } catch (Exception ex) {
        return ex.Message;
      }
    }
 }
}

For image labeling we need this class

LabelDetect or.cs

    using Google.Apis.Auth.OAuth2;
using Google.Apis.Services;
using Google.Apis.Vision.v1;
using Google.Apis.Vision.v1.Data;
using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace ThingTranslatorAPI2 {
  public class LabelDetectior {
    
   // Get labels from image in memory
    public static IList<AnnotateImageResponse> GetLabels(byte[] imageArray) {
      try
      {
        VisionService vision = CreateAuthorizedClient();
        // Convert image to Base64 encoded for JSON ASCII text based request   
        string imageContent = Convert.ToBase64String(imageArray);
        // Post label detection request to the Vision API
        var responses = vision.Images.Annotate(
            new BatchAnnotateImagesRequest() {
              Requests = new[] {
                    new AnnotateImageRequest() {
                        Features = new [] { new Feature() { Type = "LABEL_DETECTION"}},
                        Image = new Image() { Content = imageContent }
                    }
           }
            }).Execute();
        return responses.Responses;
      }
      catch (Exception ex)
      {
        Trace.TraceError(ex.StackTrace);
      }
      return null;
    }

    // returns an authorized Cloud Vision client. 
    public static VisionService CreateAuthorizedClient() {
      try {
        GoogleCredential credential = GoogleCredential.GetApplicationDefaultAsync().Result;
        // Inject the Cloud Vision scopes
        if (credential.IsCreateScopedRequired) {
          credential = credential.CreateScoped(new[]
          {
                    VisionService.Scope.CloudPlatform
                });
        }
        return new VisionService(new BaseClientService.Initializer {
          HttpClientInitializer = credential,
          GZipEnabled = false
        });
      } catch (Exception ex) {
        Trace.TraceError("CreateAuthorizedClient: " + ex.StackTrace);
      }
      return null;
    }
  }
}<annotateimageresponse>

Response.cs will look like this

namespace ThingTranslatorAPI2.Controllers
{
  public class Response
  {
    public string Original { get; set; }
    public string Translation { get; set; }
    public string Error { get; set; }
  }
}

If you have any problem compiling the code check the source code attached here.

Now lets publish this to Azure Cloud. Go to Build - Publish and fill in all 4 inpout boxes to match your Azure settings.

Now we can use Postman to test it.

Response
{"Original":"mouse","Translation":"miš","Error":null}

We received response that contain image label in english and translated version for the language we specified with langCode parameter..

Points of Interest

There are few API's available today that can handle image labeling. One of them which I found very exotic is called CloudSight. Although it is more accurate then others it relies on human tagging. Donwside of this is that it takes more time then a machine to do the job. Usually reponse is received after 10-30 seconds.
I can imagine if we run our app and the timeout happens. How could we call it conection timeout or maybe cofee break timout ; -) ?

That's all in this tutorial. In next article we will build Thing Translator app that consumes this API.

Happy coding !

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)