Introduction
Microsoft has recently launched a new set of machine-learning APIs called "Project Oxford" that include functionality for face detection and recognition, speech recognition and synthesis, vision and understanding of natural languages. The face API have been demonstrated with a small application that went viral: how-old.net. In a few days, tens of millions of users have tried hundred millions of pictures. Even if though age guessing was not that good, the demo showed how easily these APIs can be integrated in any application.
The Project Oxford services are exposed as RESTful APIs and the SDK includes .NET and Java (for Android) REST wrappers. The documentation provides samples in variety of languages: JavaScript, C#, PHP, Python, Ruby, Curl, Java, ObjC. Since consuming RESTful services is also possible in C++ using the C++ REST SDK, I have decided to demonstrate how to integrate the face APIs in an MFC application. A similar approach can be taken to integrate speech, vision or natural language understanding.
Signing Up for the Service
The Project Oxford APIs are Windows Azure services available for free (and currently in a beta version), but all calls must be signed with a subscription key assigned to each Windows Azure account. In order to get this key, you must sign-up for the each service individually (face, speech, vision, etc.).
In the Windows Azure portal, you must go to Marketplace and choose New and then again select Marketplace.
You will be able to select an app service. Search for Face API and select that.
You must select a plan (the only one available now is for free), a name for the service and other things.
On the last page, review the purchase and accept it.
Managing Your Subscription Keys
Once the service is provisioned, you can view it under the Marketplace.
Use the Manage command to view and (if the case) regenerate your subscription keys for the different app services.
Copy the subscription key from here to use it with the service calls.
Face APIs
The Project Oxford APIs are documented here and the reference documentation for the Face API is available here.
The Face API provides more than just face detection capabilities. It allows associating faces to persons, grouping persons into groups, identifying a person in a group based on one or more input faces, etc. Face services include:
- detection of human faces in an image
- verification that two faces represent the same person
- identification of a person in a group by faces
- dividing a list of faces into groups based on face similarities
- finding similar faces in a list of faces
In this article, we will look only at face detection. Face detection means identifying human faces in an image. This returns the position of the face, position of eyes, nose and mouth (referred as face landmarks), and additional attributes such as head pose, gender and age. The last two are an experimental feature, and as how-old.net has shown guessing the age is not very accurate for the moment.
Face detection APIs have several limitations, including:
- Supported images formats are BMP, PNG, JPEG and GIF
- Image size must not exceed 4MB
- Faces are only detected if they are larger than 36x36 pixels and smaller than 4096x4096 pixels; however, the maximum number of returned faces is 64 and because of various technical reasons, not all faces may be detected
It is possible to detect faces in an image either specified by an URL with JSON content type, or uploaded as part of the request with content type application/octect-stream
. In this article, we will use the second option.
The detection API is documented here. For the following image, the service returns the JSON data shown below when requesting analysis of age, gender and head pose (and ignore the face landmarks for simplicity).
[
{
"faceId":"4ad67da7-c86b-4dc8-8565-a224cda71253",
"faceRectangle":{
"top":47,
"left":53,
"width":58,
"height":58
},
"attributes":{
"headPose":{
"pitch":0.0,
"roll":2.4,
"yaw":-3.4
},
"gender":"male",
"age":32
}
}
]
As a side note, in this case, the face analysis only got the age wrong by several years (more than the actual age), which I expect to be somehow in a reasonable error margin. If the picture is smaller though, the analysis returns a different result, this time much closer to the actual age (at the time of taking the picture).
Demo C++ App
To demonstrate the use of these APIs in a C++ application, I have prototyped a simple MFC application where you can load an image, run the face detection and then show the detected faces, age and gender on the image. The male faces would be identified with a blue rectangle and female faces with a red rectangle.
The application is very simplistic: it allows you to open a BMP or JPEG image which is then painted in the window's client area. There is no resizing or scrolling going on as this is only a demo. If you are interested, you can take a look at the attached source code to see how the loading and painting is done.
In order to consume the Face
REST APIs, we need to use the C++ REST SDK. This is available as a NuGet package, so I used the Visual Studio's NuGet package manager to search for and install it.
Notice that the cpprest
is an aggregate package that puts together all the individual packages targeting different platforms. The total size exceeds 1GB so you probably want to download only cpprestsdk.v120.windesktop.msvcstl.dyn.rt-dyn which is what you need to develop for Windows with Visual Studio 2013. (See C++ Rest SDK 2.5.0 release notes for more details.)
There are several components from the C++ REST SDK that we'll use: the http_client
(used to connect to a HTTP service), json, async file streams and tasks. In order to use them, we have to include several headers.
#include "cpprest\json.h"
#include "cpprest\http_client.h"
#include "cpprest\filestream.h"
using namespace concurrency;
using namespace concurrency::streams;
using namespace web;
using namespace web::http;
using namespace web::http::client;
In order to get the analysis of the faces in an image, we have to do the following:
- Load the image in a file stream.
- When the stream is available, make the HTTP POST request to the detection API using an
http_client
object. - When the response is available, extract the json content from it (if successful).
- When the result json is available, parse it and use the result to draw rectangles and the age on faces.
The description above indicates an asynchronous processing, which is possible with the PPL task programming model. We start an operation that returns a task and we set a continuation for each task that executes when the current task is done. Put all together, the code looks like this:
void detect_faces(
std::function<void(web::json::value)> success,
std::function<void(const char*)> error,
utility::string_t const & filename,
utility::string_t const & subscriptionKey,
bool const analyzesFaceLandmarks,
bool const analyzesAge,
bool const analyzesGender,
bool const analyzesHeadPose)
{
file_stream<unsigned char>::open_istream(filename)
.then([=](pplx::task<basic_istream<unsigned char>> previousTask)
{
try
{
auto fileStream = previousTask.get();
auto client = http_client{U("https://api.projectoxford.ai/face/v0/detections")};
auto query = uri_builder()
.append_query(U("analyzesFaceLandmarks"),
analyzesFaceLandmarks ? "true" : "false")
.append_query(U("analyzesAge"), analyzesAge ? "true" : "false")
.append_query(U("analyzesGender"), analyzesGender ? "true" : "false")
.append_query(U("analyzesHeadPose"), analyzesHeadPose ? "true" : "false")
.append_query(U("subscription-key"), subscriptionKey)
.to_string();
client
.request(methods::POST, query, fileStream)
.then([fileStream, success](pplx::task<http_response> previousTask)
{
fileStream.close();
return previousTask.get().extract_json();
})
.then([success, error](pplx::task<json::value> previousTask)
{
try
{
success(previousTask.get());
}
catch(http_exception const & e)
{
error(e.what());
}
});
}
catch(std::system_error const & e)
{
error(e.what());
}
});
}
The detect_faces()
function takes several parameters:
- Two callbacks, one for success, when we pass the json value that we got back, and one in case of error, when we pass a
string
representing the error message - The path on disk of the image that is analyzed
- The subscription key
- Optional arguments that indicate what additional analysis is to be performed
The function can be called as shown below:
auto doc = GetDocument();
auto path = doc->GetImagePath();
auto stdpath = path.GetBuffer(path.GetLength());
path.ReleaseBuffer();
auto error = [](const char* error){
std::wostringstream ss;
ss << error << std::endl;
AfxMessageBox(ss.str().c_str()); };
auto werror = [](const wchar_t* error){
std::wostringstream ss;
ss << error << std::endl;
AfxMessageBox(ss.str().c_str()); };
auto success = [this, werror](web::json::value object) {
m_faces = faceapi::parse_face_result(object, werror);
this->Invalidate();
};
faceapi::detect_faces(success, error, stdpath,
U("your-subscription-key"), false, true, true, true);
Notice that you have to use the subscription key you got when you signed up for the application service.
If the call is successful, then we get a response with a JSON that looks like the example shown above. If the function failed, then we also get a JSON value back with an error code and message. Such a message may look like this:
{
"code":"InvalidImageSize",
"message":"Image size is too small or too big."
}
Function parse_face_result()
parses the JSON value we get back, either to extract the result of the analysis or the error message and returns a collection of face
objects.
std::vector<faceapi::face> parse_face_result(
web::json::value object,
std::function<void(wchar_t const *)> error)
{
std::vector<faceapi::face> faces;
if(!object.is_null())
{
if(object.has_field(U("code")))
{
auto message = object.at(U("message")).as_string();
error(message.c_str());
}
else
{
auto arr = object.as_array();
for(auto const & obj : arr)
{
try
{
auto face = faceapi::face{};
face.faceId = obj.at(U("faceId")).as_string();
auto const & fr = obj.at(U("faceRectangle"));
face.faceRectangle.width = fr.at(U("width")).as_integer();
face.faceRectangle.height = fr.at(U("height")).as_integer();
face.faceRectangle.top = fr.at(U("top")).as_integer();
face.faceRectangle.left = fr.at(U("left")).as_integer();
auto const & attr = obj.at(U("attributes")).as_object();
if(!attr.empty())
{
face.attributes.age = attr.at(U("age")).as_integer();
face.attributes.gender = (attr.at(U("gender")).as_string() == U("male")) ?
faceapi::gender::male : faceapi::gender::female;
auto const & hpose = attr.at(U("headPose")).as_object();
if(!hpose.empty())
{
face.attributes.headPose.pitch = hpose.at(U("pitch")).as_double();
face.attributes.headPose.roll = hpose.at(U("roll")).as_double();
face.attributes.headPose.yaw = hpose.at(U("yaw")).as_double();
}
}
faces.push_back(face);
}
catch(std::exception const &)
{
}
}
}
}
return faces;
}
The face
type and other types are defined as follows:
namespace faceapi
{
struct face_rectangle
{
int width = 0;
int height = 0;
int left = 0;
int top = 0;
};
struct face_landmark
{
double x = 0;
double y = 0;
};
struct face_landmarks
{
face_landmark pupilLeft;
face_landmark pupilRight;
face_landmark noseTip;
face_landmark mouthLeft;
face_landmark mouthRight;
face_landmark eyebrowLeftOuter;
face_landmark eyebrowLeftInner;
face_landmark eyeLeftOuter;
face_landmark eyeLeftTop;
face_landmark eyeLeftBottom;
face_landmark eyeLeftInner;
face_landmark eyebrowRightInner;
face_landmark eyebrowRightOuter;
face_landmark eyeRightInner;
face_landmark eyeRightTop;
face_landmark eyeRightBottom;
face_landmark eyeRightOuter;
face_landmark noseRootLeft;
face_landmark noseRootRight;
face_landmark noseLeftAlarTop;
face_landmark noseRightAlarTop;
face_landmark noseLeftAlarOutTip;
face_landmark noseRightAlarOutTip;
face_landmark upperLipTop;
face_landmark upperLipBottom;
face_landmark underLipTop;
face_landmark underLipBottom;
};
struct head_pose
{
double roll = 0;
double yaw = 0;
double pitch = 0;
};
enum class gender
{
female,
male
};
struct face_attributes
{
int age = 0;
gender gender = gender::female;
head_pose headPose;
};
struct face
{
std::wstring faceId;
face_rectangle faceRectangle;
face_attributes attributes;
};
void detect_faces(
std::function<void(web::json::value)> success,
std::function<void(char const *)> error,
utility::string_t const & filename,
utility::string_t const & subscriptionKey,
bool const analyzesFaceLandmarks = false,
bool const analyzesAge = false,
bool const analyzesGender = false,
bool const analyzesHeadPose = false);
std::vector<face> parse_face_result(
web::json::value object,
std::function<void(wchar_t const *)> error);
}
With the face information available, we can draw the rectangle and the age text on the image. This is done in the OnDraw()
method of the view, but if you want to see how it's done, look at the source code, as it is not that important for the purpose of this article.
The following image shows how the detection works on an image with a bunch of people (image source: wikipedia):
Reworking the Code
The detect_faces()
function takes some functions as parameter that it calls back on success or failure. This can be reworked so that it actually returns a task and then we set a continuation on the task to do something when the result is available. Also, exception handling could be moved out of this task to the last continuation, as any exception escaping from a task's body is caught and re-thrown out of a wait()
or get()
call on the last task. So the detect_faces()
function can be re-implemented as this:
pplx::task<web::json::value> detect_faces_async(
utility::string_t const & filename,
utility::string_t const & subscriptionKey,
bool const analyzesFaceLandmarks,
bool const analyzesAge,
bool const analyzesGender,
bool const analyzesHeadPose,
pplx::cancellation_token const & token)
{
return file_stream<unsigned char>::open_istream(filename)
.then([=](pplx::task<basic_istream<unsigned char>> previousTask)
{
if(!token.is_canceled())
{
auto fileStream = previousTask.get();
auto client = http_client{U("https://api.projectoxford.ai/face/v0/detections")};
auto query = uri_builder()
.append_query(U("analyzesFaceLandmarks"),
analyzesFaceLandmarks ? "true" : "false")
.append_query(U("analyzesAge"), analyzesAge ? "true" : "false")
.append_query(U("analyzesGender"), analyzesGender ? "true" : "false")
.append_query(U("analyzesHeadPose"), analyzesHeadPose ? "true" : "false")
.append_query(U("subscription-key"), subscriptionKey)
.to_string();
return client
.request(methods::POST, query, fileStream, token)
.then([fileStream](pplx::task<http_response> previousTask)
{
fileStream.close();
return previousTask.get().extract_json();
});
}
return pplx::task_from_result(json::value());
});
}
In this case, we'd have to rework the calling code also to the following:
auto doc = GetDocument();
auto path = doc->GetImagePath();
auto stdpath = path.GetBuffer(path.GetLength());
path.ReleaseBuffer();
auto error = [](const char* error){
std::wostringstream ss;
ss << error << std::endl;
AfxMessageBox(ss.str().c_str()); };
auto werror = [](const wchar_t* error){
std::wostringstream ss;
ss << error << std::endl;
AfxMessageBox(ss.str().c_str()); };
auto success = [this, werror](web::json::value object) {
m_faces = faceapi::parse_face_result(object, werror);
this->Invalidate();
};
faceapi::detect_faces_async(stdpath, U("your-subscription-key"), false, true, true, true)
.then([this, werror, error](pplx::task<web::json::value> previousTask) {
try
{
m_faces = faceapi::parse_face_result(previousTask.get(), werror);
this->Invalidate();
}
catch(std::exception const & e)
{
error(e.what());
}
});
A similar implementation can be put in place for the detection of faces in an image specified by an URL. In this case, we no longer have to load a file from disk, but instead pass a JSON value in the body of the request.
pplx::task<web::json::value< detect_faces_from_url_async(
utility::string_t const & url,
utility::string_t const & subscriptionKey,
bool const analyzesFaceLandmarks,
bool const analyzesAge,
bool const analyzesGender,
bool const analyzesHeadPose,
pplx::cancellation_token const & token)
{
auto client = http_client{U("https://api.projectoxford.ai/face/v0/detections")};
auto query = uri_builder()
.append_query(U("analyzesFaceLandmarks"), analyzesFaceLandmarks ? "true" : "false")
.append_query(U("analyzesAge"), analyzesAge ? "true" : "false")
.append_query(U("analyzesGender"), analyzesGender ? "true" : "false")
.append_query(U("analyzesHeadPose"), analyzesHeadPose ? "true" : "false")
.append_query(U("subscription-key"), subscriptionKey)
.to_string();
auto content = web::json::value {};
content[U("url")] = web::json::value(url);
return client
.request(methods::POST, query, content, token)
.then([](pplx::task<http_response> previousTask)
{
return previousTask.get().extract_json();
});
}
Conclusion
Project Oxford provides a series of machine learning APIs that are available for free (though in beta for now) and can be easily integrated into your applications. In this article, I have shown what you have to do to start using the APIs and how you can consume some of the face APIs in a C++ application (with MFC) by using the C++ REST SDK.
History
- 8th May, 2015: Initial version