Some time ago, I used the Bing Translator API to help create localization for some of our products. As Microsoft recently retired the Data Market used to provide this service, it was high time to migrate to the replacement Cognitive Services API hosted on Azure. This article covers using the basics of Azure cognitive services to translate text using simple HTTP requests.
Getting Started
I'm going to assume you've already signed up for the Text Translation Cognitive Services API. If you haven't, you can find a step by step guide on the API documentation site. Just as with the original version, there's a free tier where you can translate 2 million characters per month.
Once you have created your API service, display the Keys page and copy one of the keys for use in your application (it doesn't matter which one you choose).
Remember that these keys should be kept secret. Don't paste them in screenshots as I have above (unless you regenerated the key after taking the screenshot!), don't commit them to public code repositories - treat them as any other password. "Keep it secret, keep it safe".
Creating a Login Token
The first thing we need to do is generate an authentication token. We do this by sending a POST
request to Microsoft's authentication API along with a custom Ocp-Apim-Subscription-Key
header that contains the API key we copied earlier.
Note: When using the HttpWebRequest
object, you must set the ContentLength
to be zero even though we're not actually setting any body content. If the header isn't present, the authentication server will throw a 411
(Length Required) HTTP exception.
Assuming we have passed a valid API key, the response body will contain a token we can use with subsequent requests.
Tokens are only valid for 10 minutes and it is recommended you renew these after 8 or so minutes. For this reason, I store the current time so that future requests can compare the stored time against the current and automatically renew the token if required.
private string _authorizationKey;
private string _authorizationToken;
private DateTime _timestampWhenTokenExpires;
private void RefreshToken()
{
HttpWebRequest request;
if (string.IsNullOrEmpty(_authorizationKey))
{
throw new InvalidOperationException("Authorization key not set.");
}
request = WebRequest.CreateHttp("https://api.cognitive.microsoft.com/sts/v1.0/issueToken");
request.Method = WebRequestMethods.Http.Post;
request.Headers.Add("Ocp-Apim-Subscription-Key", _authorizationKey);
request.ContentLength = 0;
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
_authorizationToken = this.GetResponseString(response);
_timestampWhenTokenExpires = DateTime.UtcNow.AddMinutes(8);
}
}
Using the Token
For all subsequent requests in this article, we'll be sending the token with the request. This is done via the Authorization
header which needs to be set with the string Bearer <TOKEN>
.
Getting Available Languages
The translation API can translate a reasonable range of languages (including for some reason Klingon), but it can't translate all languages. Therefore, if you're building a solution that uses the translation API, it's probably a good idea to find out what languages are available. This can be done by calling the GetLanguagesForTranslate
service method.
Rather annoyingly, the translation API doesn't use straightforward JSON objects but instead the ancient XML serialization dialect (it appears to be a WCF service rather than newer WebAPI) which seems an odd choice in this day and age of easily consumed JSON services. Still, at least it means I can create a self contained example project without needing external packages.
First, we create the HttpWebRequest
object and assign our Authorization
header. Next, we set the value of the Accept
header to be application/xml
. The API call actually seems to ignore this header and always returns XML regardless, but at least if it changes in future to support multiple outputs, our existing code is explicit in what it wants.
The body of the response contains an XML document similar to the following:
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<string>af</string>
<string>ar</string>
<string>bn</string>
<string>ur</string>
<string>vi</string>
<string>cy</string>
</ArrayOfstring>
You could parse it yourself, but I usually don't like the overhead of having to work with name-spaced XML documents. Fortunately, I can just use the DataContractSerializer
to parse it for me.
In order to use the DataContractSerializer
class, you need to have a reference to System.Runtime.Serialization
in your project.
public string[] GetLanguages()
{
HttpWebRequest request;
string[] results;
this.CheckToken();
request = WebRequest.CreateHttp
("https://api.microsofttranslator.com/v2/http.svc/GetLanguagesForTranslate");
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
results = ((List<string>)new DataContractSerializer(typeof
(List<string>)).ReadObject(stream)).ToArray();
}
}
return results;
}
Getting Language Names
The previous section obtains a list of ISO language codes, but generally you would probably want to present something more friendly to end-users. We can obtain localized language names via the GetLanguageNames
method.
This time, we need to perform a POST
, and include a custom body containing the language codes we wish to retrieve friendly names for, along with a query string argument that specifies which language to use for the friendly names.
The body should be XML similar to the following. This is identical to the output of the GetLanguagesForTranslate
call above.
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<string>af</string>
<string>ar</string>
<string>bn</string>
<string>ur</string>
<string>vi</string>
<string>cy</string>
</ArrayOfstring>
The response body will be a string array where each element contains the friendly language name of the matching element from the request body. The following example is a sample of output when German (de
) friendly names are requested.
<ArrayOfstring xmlns="http://schemas.microsoft.com/2003/10/Serialization/Arrays"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<string>Afrikaans</string>
<string>Arabisch</string>
<string>Bangla</string>
<string>Urdu</string>
<string>Vietnamesisch</string>
<string>Walisisch</string>
</ArrayOfstring>
Previously, we used the DataContractSerializer
deserialize the response body and we can use the same class to serialize the request body too. We also have to specify the Content-Type
of the data we're transmitting. And of course, make sure we include the locale
query string argument in the posted URI.
If you forget to set the
Content-Type
header, then according to the
documentation, you'd probably expect it to return 400 (Bad Request). Somewhat curiously, it returns 200 (OK) with a 500-esque HTML error message in the body. So don't forget to set the content type!
public string[] GetLocalizedLanguageNames(string locale, string[] languages)
{
HttpWebRequest request;
string[] results;
DataContractSerializer serializer;
this.CheckToken();
serializer = new DataContractSerializer(typeof(string[]));
request = WebRequest.CreateHttp
("https://api.microsofttranslator.com/v2/http.svc/GetLanguageNames?locale=" + locale);
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
request.ContentType = "application/xml";
request.Method = WebRequestMethods.Http.Post;
using (Stream stream = request.GetRequestStream())
{
serializer.WriteObject(stream, languages);
}
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
results = (string[])serializer.ReadObject(stream);
}
}
return results;
}
Translating Phrases
The final piece of the puzzle is to actually translate a string
. We can do this using the Translate
service method, which is a simple enough method to use - you pass the text, source language and output language as query string parameters, and the translation will be returned in the response body as an XML string.
You can also specify a category for the translation. I believe this is for use with Microsoft's Translation Hub so as of yet I haven't tried experimenting with this parameter.
The following example is the response returned when requesting a translation of Hello World!
from English (en
) to German (de
).
<string xmlns="http://schemas.microsoft.com/2003/10/Serialization/">Hallo Welt!</string>
The request is similar to other examples in this article. The only point to note is that as the text
query string
argument will contain user enterable content, I'm encoding it using Uri.EscapeDataString
to account for any special characters.
public string Translate(string text, string from, string to)
{
HttpWebRequest request;
string result;
string queryString;
this.CheckToken();
queryString = string.Concat("text=", Uri.EscapeDataString(text), "&from=", from, "&to=", to);
request = WebRequest.CreateHttp("https://api.microsofttranslator.com/v2/http.svc/Translate?" +
queryString);
request.Headers.Add("Authorization", "Bearer " + _authorizationToken);
request.Accept = "application/xml";
using (WebResponse response = request.GetResponse())
{
using (Stream stream = response.GetResponseStream())
{
result = (string)_stringDataContractSerializer.ReadObject(stream);
}
}
return result;
}
Other API Methods
The GetLanguagesForTranslate
, GetLanguageNames
and Translate
API methods above describe the basics of using the translation services. The service API does offer additional functionality, such as the ability to translate multiple strings at once or to return multiple translations for a single string or even to try and detect the language of a piece of text. These are for use in more advanced scenarios that what I'm currently interested in and so I haven't looked further into these methods.
Sample Application
The code samples in this article are both overly verbose (lots of duplicate setup and processing code) and functionally lacking (no checking of status codes or handling of errors). The download sample accompanying this article includes a more robust TranslationClient
class that can be easily used to add the basics of the translation APIs to your own applications.
Note that unlike most of my other articles/samples, this one won't run out the box - the keys seen in the application and screenshots have been revoked, and you'll need to substitute the ones you get when you created your service using the Azure Portal.
History
- 22/05/2017 - Published on Code Project
- 05/05/2017 - First published on cyotek.com