Intro

I recently came across an interesting face detection and recognition cloud service from Microsoft. It’s part of Microsoft Cognitive Services that helps you solve AI problems in the cloud, and can be acquired through Azure. Face API is a very simple to use Web API that puts the power of face recognition into the hands of people that do not possess any AI or CV skills. If you ever wanted to do face recognition but were afraid of all the knowledge required to set it up, this will be an interesting article for you.

 

Pricing

Face API comes in two pricing tiers: Free and Standard.

The Free tier has some limitations:

20 API calls per minute
30k API calls per month
1,000 person groups (each holds up to 1,000 persons)

The Standard tier also has some limitations:

10 API calls per second
1,000,000 person groups. Each holds up to 10,000 persons.

Make sure you check out the official page for more pricing details, because this may change over time.

 

Usage

The best thing about Face API is it’s a simple Web API anyone can learn to use. It’s well documented and even though SDK’s are available for Android, Python, iOS and Windows, you can use it in any programming language. The official documentation page contains tutorials and quickstarts for many popular programming languages and the API reference is also available here. The simplest way to start using Face API would be through curl (if you are on Linux) or Postman (if you are on Windows). When you get comfortable with it, I suggest moving to Python, C# or any other language that has a Face API SDK available. Sometimes you have to copy parameters from one response into another call, so storing those values into variables will help you do things faster and in a more automated way.

My choice was C# .NET and in a matter of seconds I was able to grab the SDK through nuget package manager in Visual Studio, and had the SDK ready to use. You can find basic tutorials for SDKs under the Tutorials section. I will quickly cover C# .NET usage here because it’s really short. After installing Microsoft.ProjectOxford.Face nuget in Visual Studio, make sure you include the following namespaces:

using Microsoft.ProjectOxford.Face;
using Microsoft.ProjectOxford.Face.Contract;

Now you just have to initialize the client class. Make sure you replace API key and API URL with your own.

private readonly IFaceServiceClient faceAPI = new FaceServiceClient("your_API_key", "https://eastus.api.cognitive.microsoft.com/face/v1.0");

And that’s it! Now you are ready to do API calls by simply calling functions from your client class. Here is an example of calling face detection:

var faces = await faceAPI.DetectAsync(url);

 

Obtaining Face Dataset

In order to demo Face API features, I obtained some face datasets just to have a collection to work and test with. For demo purposes, you can obtain a face dataset like MS celeb. MS celeb is a huge dataset containing 1M faces of 10,000 celebrities. The sample set can be downloaded here. Data will come in one tsv file and its structure is explained here. Column 2 contains celebrity’s name and Column 6 its face, coded in Base64 string. Make sure you decode it into image octet stream before sending it to Face API. Here is an example code:

using (TextReader reader = new StreamReader(path))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] columns = line.Split('\t');
string name = column[1];
string data = column[5]; byte[] decodedData = Convert.FromBase64String(data);
var stream = new MemoryStream(decodedData); var faces = await faceAPI.DetectAsync(stream);
// do custom operation, call will also dispose the stream
}
}

 

 

API Call Structure

Before going any further and doing any calls, one must understand how this API is intended to be used. If you take a look at this API reference you will sees terms like Person, Face, Person Group, Training, etc. Now, what is all that? How do I detect a face and recognize who it belongs to? Let’s start from the beginning and summarize all those terms and their usage.

First, you will have to create a Person Group. The Person Group will contain people (Person). Person has a name and collection of its Faces. Person Group can contain up to 10,000 Persons, and each Person can have 248 different faces. I have recently noticed a Large Person Group is available, pushing the limits up to 1M Persons per group on the Standard tier. After you add all the people and their faces to your Person Group, the group has to be trained. You can do this by calling PersonGroup Train API. Training can take some time depending on the size of your dataset. You can check training status by calling PersonGroup Get training status API. Here is the summary.

1. Create Person Group
2. Create Person inside Person Group
3. Add Person face:
a. Call Face - Detect to get face rectangle
b. Call Add Person Face using response data from previous call
4. Train group
5. Get group training status
6. Recognize person’s face:
a. Call Face- Detect to get faceId
b. Call Face - Identify using faceId from previous call

As you can see, in order to do actions like storing a person’s face or identifying a person, you must do more than one API call. In the case of Face - Detect, as a result you will get a JSON response containing rectangles of all the faces detected on the image you have sent as an octet stream or url. Keep in mind only 64 faces can be detected on one picture.

In the case of Face - Identify, you have to do Face - Detect first because it will give you a faceId parameter you have to use in the Face - Identify call. All the pictures you send to a Face - Detect API call will be temporarily stored for 24 hours. The faces you store when adding a person’s face are persistent and they also have their faceId. The Face - Identify call can be filled with up to 10 faceIDs.

There are also other features like Face - Find Similar, Face - Verify and others not covered here for the sake of simplicity. Feel free to experiment with those by yourself. Also notice the Face - Detect call can give you some interesting feedback on face attributes like age, gender, headPose, smile, facial hair, glasses, emotion, hair, makeup, occlusion, accessories, blur and exposure. You can turn that feature on by including the “returnFaceAttributes=age,gender” query string in your URL. Note that turning on this feature will increase computational time.

 

Drawbacks

Face API has two drawbacks I would like to mention. Once you store a person’s face, you cannot get the actual picture back. There is an API call Person - Get Face, but it will just return some picture metadata and not the octet stream you originally sent. If you would like to take a look at faces you have stored for a person, you won’t be able to do it unless you have kept pictures somewhere else and associated their names with the faceId you get in return once the face is stored in Face Cloud.

Another drawback I’d like to mention is that storing and identifying faces requires one extra call to detect a face. If you have lots of faces to store it will cost you double because you have to make two calls to do one thing. The same thing happens when identifying a face.

 

API call reduction hacks

If you are doing face detection and recognition on a large scale (for example, 1M faces) you will probably want to know how to achieve this with fewer calls. In the case of storing faces, you could detect a face locally instead of doing one extra call to Face API. I did a little experiment to see how it works in practice even though Microsoft states "If the provided "targetFace" rectangle is not returned from Face - Detect, there’s no guarantee to detect and add the face successfully."

To detect a face on picture locally, I installed Emgu nuget in my project. Emgu is an OpenCV project port for .NET. I was interested how its Haar algorithm would work compared to Face API’s detect. Here is the code:

using Emgu.CV;
using Emgu.CV.Structure; private List<Rectangle> DetectFace(MemoryStream imageStream)
{
var faces = new List<Rectangle>();
Bitmap bmpImage = new Bitmap(imageStream);
var img = new Image<Bgr, Byte>(bmpImage); using(CascadeClassifier face = new CascadeClassifier("HaarCascade/haarcascade_frontalface_alt2.xml"))
{
var ugray = img.Convert<Gray, byte>();
Rectangle[] facesDetected = face.DetectMultiScale(ugray, 1.1, 10, Size.Empty);
faces.AddRange(facesDetected);
}
return faces;
}

 

The result can be seen on the picture bellow. The blue rectangle is the face detected by OpenCV and the red rectangle is from Face - Detect call. The conclusion is that OpenCV takes a slightly larger crop on every picture tested. After I successfully added faces detected by OpenCV and ran some tests, the result was good. I was able to identify the person correctly. 

There are a few more hacks that came to my mind but that I didn’t test. For example, Face - Detect call can detect up to 64 faces on one picture. Theoretically, you should be able to stitch 64 faces into one picture so it doesn’t exceed limits which are 4096x4096 pixels. The response should return 64 face rectangles for the price of one. You should be able to map those rectangles back to the pictures you stitched together. The identification call can also do 10 pictures at a time, so you could save some here too.

 

Conclusion

Face API is a great tool for fast, easy face detection and recognition without the need for any AI/CV skills. I really like their free tier option that lets you experiment with the API before diving into some expenses. The simple, logical usage combined with rich and well-written documentation makes this API worth including in your AI projects.