eGandalf's Happy Coding Corner: Face-Based Login with Episerver and Microsoft Cognitive Services

Last week was Episerver Ascend 2017 and as part of the last day a good number of us participated in a Microsoft-sponsored Code Bash - a miniaturized hackathon, if you will. While the event itself was not without flaw (we could all have used more time and there were scheduling conflicts with other sessions), the technology employed was exciting and fun to work with.

Our team, made up of Brian Browning (@brianbrowning), Jay Burling (@jayburling), Kenny Gutierrez (@kennygutierrez), Rob Folan (not on Twitter), and myself (@egandalf), chose to attempt to build out some code using Microsoft's Face API to provide a biometric authentication option. While this would have a lot of flaws and concerns to be addressed in a real-life implementation, it seemed quite reasonable.

The Microsoft Cognitive Services Face API is pretty robust and I'll be using this and other image processing tools for a side project in the near future. I assume Microsoft wants these APIs to become something of a playground for aspiring developers and entrepreneurs since they've set up this endpoint to be free for up to 30k calls per month. Given that setting up a person group, training, and then authentication itself can result in half a dozen calls at minimum, it's easy to rack up the numbers. Still, 30k is plenty to develop a good personal project or a proof of concept for something new.

The code I'm about to share below is NOT PRODUCTION quality. At all. Don't use it that way. I mean it. Don't. Use. It. In. Production. It's hackathon code, demo ware, or PoC. Whatever you want to call it.

That being said, let's begin.

Building a Face API-Powered Login Page

I started with a vanilla Alloy site and spun up a quick Login page type.

using EPiServer.Core;
using EPiServer.DataAnnotations;

namespace CodeBash.Models.Pages
{
    [ContentType(DisplayName = "Login Page", GUID = "d204762a-ded9-4b9e-b7e8-a7b8d9da94c4", Description = "")]
    public class LoginPage : SitePageData
    {
        public virtual XhtmlString Instructions { get; set; }
    }
}

And a ViewModel in which I added only a single field. Normally, if you want to upload file data, you would use a DataUpload decorator and HttpFileUploadBase as the property type. In this case, we're capturing the image from the user's webcam (or front-facing cam on their phone), so it's not so much transferring a file as it is capturing binary data in the form of a Base64 string and sending it to the server for processing. So the ViewModel contains a string property which will be output as a hidden field in the view.

public class LoginModel : PageViewModel
{
    public string ImageUpload { get; set; }

    public LoginModel(LoginPage currentPage) : base(currentPage)
    {
    }
}

Posting back to an Episerver-managed page can be a bit weird (as I've learned) so I created a separate model to accept the POST data. It's just a model with the same ImageUpload property.

public class LoginPostModel
{
    public string ImageUpload { get; set; }
}

Then we get to the LoginPageController and, ultimately the View. The controller will have two ActionResult methods, with the latter as an async since there are calls to async methods and we want to make use of await. The core of it will look like this:

public class LoginPageController : PageController
{

    public ActionResult Index(LoginPage currentPage)
    {
        var model = new LoginModel(currentPage);

        return View(model);
    }

    public async Task Post(LoginPostModel pageData)
    {
            

        // redirect to start page fter login
        return RedirectToAction("Index", new { node = ContentReference.StartPage });
    }
}

And we'll fill in the rest of the Post method in a bit.

I'm going to simply post all of the contents of the View below. If you're a front-end person (I'm about 75% not) then it will make sense. Otherwise, suffice it to say that this could be done better. Also the large chunk of JS below is a cross-browser way of enabling video capture (streaming through a <video> element), then snagging a still from that video stream into a <canvas> which is then converted into a DataUrl for the <img> element. And it's that DataUrl, which is a prefix + a Base64-encoded byte array, that we'll be sending back to the server to perform a check for authentication. If you want to dig into the weeds on it, comment or tweet me and I'll reply.

@using EPiServer.Core
@using EPiServer.Web.Mvc.Html

@model LoginModel

<div>
    <h1 @Html.EditAttributes(m => m.CurrentPage.Name)>@Model.CurrentPage.Name</h1>
    @Html.PropertyFor(m => m.CurrentPage.Instructions)
    @using (Html.BeginForm("Post", "LoginPage"))
    {
        <div class="container">
            <div class="row">
                <div class="span4 text-center">
                    <h2>1. Enable Video</h2>
                    <p>Your browser may require your permission.</p>
                    <p>
                        <button type="button" id="videoButton" class="btn btn-info btn-small">Enable Video</button>
                    </p>
                    <video autoplay="autoplay" id="loginVideo" style="max-width: 100%;"></video>
                </div>
                <div class="span4 text-center">
                    <h2>2. Capture</h2>
                    <p>Face the camera and take a clear, well-lit photo.</p>
                    <p>
                        <button type="button" id="captureButton" class="btn btn-info btn-small">Capture</button>
                    </p>
                    <canvas style="display: none;" id="loginCanvas"></canvas>
                    <img src="" id="loginImage" style="max-width: 100%;" />
                </div>
                <div class="span4 text-center">
                    <h2>3. Login</h2>
                    <p>Click this absurdly large button.</p>
                    <p>
                        <button type="submit" class="btn btn-primary btn-large btn-success" style="width: 200px; height: 200px; max-width: initial;">Login</button>
                    </p>
                </div>
            </div>
        </div>
        @Html.HiddenFor(m => m.ImageUpload, new { id = "loginImageStore" })
    }
</div>

<script>
    var video = document.querySelector("#loginVideo");
    var canvas = document.querySelector("#loginCanvas");
    var ctx = canvas.getContext('2d');
    var image = document.querySelector("#loginImage");
    var videoBtn = document.querySelector("#videoButton");
    var captureBtn = document.querySelector("#captureButton");
    var store = document.querySelector("#loginImageStore");

    var localMediaStream = null;

    video.addEventListener('click', doCapture, false);
    videoBtn.addEventListener('click', startVideoCapture, false);
    captureBtn.addEventListener('click', doCapture, false);

    function hasGetUserMedia() {
        return !!(navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia);
    }

    function doCapture() {
        if (localMediaStream) {
            canvas.width = video.videoWidth;
            canvas.height = video.videoHeight;
            ctx.drawImage(video, 0, 0);
            image.src = canvas.toDataURL('image/webp');
            store.value = canvas.toDataURL('image/png');
            console.log("done!");
        }
    }

    var userMediaCallback = function (stream) {
        video.src = window.URL.createObjectURL(stream);
        localMediaStream = stream;
    }

    var errorCallback = function (e) {
        console.log("Rejected!", e);
    }

    function startVideoCapture() {
        navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;
        if (hasGetUserMedia) {
            navigator.getUserMedia({ video: true, audio: false }, userMediaCallback, errorCallback);
        } else {
            // can't login via image
        }
    }
</script>

Note that the source for the image tag is set to "image/webp." This isn't strictly necessary, but it's slightly optimized in Chrome and other browsers will fallback to using "image/png."

When it comes to the Microsoft API, before we can do any authenticating, we need to "train" their service to recognize the people we have. So apart from adding some SQL users, which I've done, we also need code to upload images for each of those users to Microsoft. Obviously, the better the photo of the face, the better big-M's cognitive services will do at recognizing follow-up images. So I have a folder of images I'm using. A better approach would be to have profile images as part of the user store and to add/update "persons" with Microsoft as they register or as the app spins up. In my case, I'm simply making use of an initialization module and hard-coding the few users I want to have access to this feature.

To do this, we need methods to create a Group and to create a Person within that Group, plus a method my initializer will call to do those things. You'll also get errors if you try to create a new Person or Group by the same name as one that already exists. So I've named my methods to FindOrCreate[Person|Group]Async. In a real-world implementation, I probably would split these out and have separate methods for Find versus Create. But as I'm only interested in making sure they're created for this code, they're one.

The methods will return TRUE if a Person or Group has been added and FALSE if it was found to already exist. This will let us determine whether the TrainPersonGroupAsync endpoint needs to be called - which will alert Microsoft that it has new images to process as part of our base collection.

Here are the three relevant methods:

public async Task CreatePeopleAsync()
{
    bool groupChanged = await FindOrCreateGroupAsync(_groupId, _groupName);

    bool friend1Changed = await FindOrCreatePersonAsync("Krista", _groupId);
    bool friend2Changed = await FindOrCreatePersonAsync("Jay", _groupId);
    bool friend3Changed = await FindOrCreatePersonAsync("Rob", _groupId);
    bool friend4Changed = await FindOrCreatePersonAsync("James", _groupId);


    if(groupChanged || friend1Changed || friend2Changed || friend3Changed || friend4Changed)
        await faceService.TrainPersonGroupAsync(_groupId);
}

public async Task<bool> FindOrCreateGroupAsync(string groupId, string groupName)
{
    var personGroup = await faceService.GetPersonGroupAsync(groupId);
    if (personGroup != null) return false;

    await faceService.CreatePersonGroupAsync(groupId, groupName);
    return true;
}

public async Task<bool> FindOrCreatePersonAsync(string name, string groupId)
{
    var persons = await faceService.GetPersonsAsync(groupId);
    if (persons.Any(p => p.Name == name)) return false;

    CreatePersonResult friend = await faceService.CreatePersonAsync(groupId, name);
    var file = File.OpenRead($@"c:\Bash\{name}.jpg");
    await faceService.AddPersonFaceAsync(groupId, friend.PersonId, file);
    return true;
}

There are a LOT more things that could be done here. But as I said, demo ware.

I also will need methods to eventually support the authentication calls, meaning we'll need to detect faces in a photo and then confirm their identity against our established PersonGroup.

Microsoft's Face Detection endpoint can accept either a Stream or a Url. Since I wasn't sure exactly how I would approach it, I decided to build methods that I could call that would accept either option to detect all faces in a photo then pass the data along to their Identify endpoint.

public async Task<string> DoFaceDetectionAsync(Stream imageStream)
{
    var faces = await faceService.DetectAsync(imageStream);
    return await GetFaceIdentityAsync(faces);
}

public async Task<string> DoFaceDetectionAsync(string imageUrl)
{
    var faces = await faceService.DetectAsync(imageUrl);
    return await GetFaceIdentityAsync(faces);
}

private async Task<string> GetFaceIdentityAsync(IEnumerable<Microsoft.ProjectOxford.Face.Contract.Face> faces)
{
    var faceIds = faces.Select(face => face.FaceId).ToArray();
    var results = await faceService.IdentifyAsync(_groupId, faceIds);
    foreach(var identifyResult in results)
    {
        if (!identifyResult.Candidates.Any()) continue;

        var candidateId = identifyResult.Candidates[0].PersonId;
        var person = await faceService.GetPersonAsync(_groupId, candidateId);
        return person.Name;
    }
    return string.Empty;
}

The complete class, with using statements, is included at the bottom of this post.

With that done, I am simply using an Initialization module to create my Group, Persons, and to trigger the Train method with Microsoft. It's as simple as a single call to CreatePeopleAsync.

[InitializableModule]
[ModuleDependency(typeof(EPiServer.Web.InitializationModule))]
public class FaceAuthInitialization : IInitializableModule
{
    public async void Initialize(InitializationEngine context)
    {
        var fi = new FaceIdentity();
        await fi.CreatePeopleAsync();
    }

    public void Uninitialize(InitializationEngine context)
    {
        // Nothing to do
    }
}

Which brings us back to the controller and our POST action.

There are two things this action method needs to do. Send the image to Microsoft for Person identification and then, upon success, force login the user. This, then can be broken down into further individual actions. So rather than split things up, I added commends and included the contents of that method below.

public async Task<ActionResult> Post(LoginPostModel pageData)
{
    // Verify we have data from the form. You would want to return some error message here rather than fail blindly.
    if (string.IsNullOrEmpty(pageData.ImageUpload)) return RedirectToAction("Index");

    // Extract the Base64 file data from the DataUrl and convert it into a Stream. A DataUrl will not work passed as a string to Microsoft (I tried).
    var base64string = Regex.Match(pageData.ImageUpload, @"data:image/(?<type>.+?),(?<data>.+)").Groups["data"].Value;
    var data = Convert.FromBase64String(base64string);
    var stream = new MemoryStream(data);

    // Perform the Identification with Microsoft's services. This will return the Person Name which is the same as my SQL User's Username.
    var faceIdentity = new FaceIdentity();
    var identity = await faceIdentity.DoFaceDetectionAsync(stream);

    // Exit if not found by Microsoft services.
    if (string.IsNullOrEmpty(identity)) return RedirectToAction("Index");

    // Setting up the SignInManager so we can force sign-in the user.
    var dbContext = new ApplicationDbContext<ApplicationUser>();
    var us = new UserStore<ApplicationUser>(dbContext);
    var um = new ApplicationUserManager<ApplicationUser>(us);

    var signInManager = new SignInManager<ApplicationUser, string>(um, HttpContext.GetOwinContext().Authentication);
    var user = await signInManager.UserManager.FindByNameAsync(identity);

    // Exit if not found in SQL
    if (user == null) return RedirectToAction("Index");

    // Force sign in
    await signInManager.SignInAsync(user, false, false);

    // redirect to start page after login
    return RedirectToAction("Index", new { node = ContentReference.StartPage });
}

And that's about it. Signing in to Episerver with your face! Give it a try and let me know what you think in the comments.

Happy coding.

Complete FaceIdentity class file contents:

using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using Microsoft.ProjectOxford.Face;
using Microsoft.ProjectOxford.Face.Contract;

namespace CodeBash.Business.Face
{
    public class FaceIdentity
    {
        private string _groupId = "codebrewers";
        private string _groupName = "Code Brewers";

        private FaceServiceClient faceService = new FaceServiceClient("[insert  your key here]", "https://westus.api.cognitive.microsoft.com/face/v1.0");

        public async Task<string> DoFaceDetectionAsync(Stream imageStream)
        {
            var faces = await faceService.DetectAsync(imageStream);
            return await GetFaceIdentityAsync(faces);
        }

        public async Task<string> DoFaceDetectionAsync(string imageUrl)
        {
            var faces = await faceService.DetectAsync(imageUrl);
            return await GetFaceIdentityAsync(faces);
        }

        private async Task<string> GetFaceIdentityAsync(IEnumerable<Microsoft.ProjectOxford.Face.Contract.Face> faces)
        {
            var faceIds = faces.Select(face => face.FaceId).ToArray();
            var results = await faceService.IdentifyAsync(_groupId, faceIds);
            foreach(var identifyResult in results)
            {
                if (!identifyResult.Candidates.Any()) continue;

                var candidateId = identifyResult.Candidates[0].PersonId;
                var person = await faceService.GetPersonAsync(_groupId, candidateId);
                return person.Name;
            }
            return string.Empty;
        }

        public async Task CreatePeopleAsync()
        {
            bool groupChanged = await FindOrCreateGroupAsync(_groupId, _groupName);

            bool friend1Changed = await FindOrCreatePersonAsync("Krista", _groupId);
            bool friend2Changed = await FindOrCreatePersonAsync("Jay", _groupId);
            bool friend3Changed = await FindOrCreatePersonAsync("Rob", _groupId);
            bool friend4Changed = await FindOrCreatePersonAsync("James", _groupId);


            if(groupChanged || friend1Changed || friend2Changed || friend3Changed || friend4Changed)
                await faceService.TrainPersonGroupAsync(_groupId);
        }

        public async Task<bool> FindOrCreateGroupAsync(string groupId, string groupName)
        {
            var personGroup = await faceService.GetPersonGroupAsync(groupId);
            if (personGroup != null) return false;

            await faceService.CreatePersonGroupAsync(groupId, groupName);
            return true;
        }

        public async Task<bool> FindOrCreatePersonAsync(string name, string groupId)
        {
            var persons = await faceService.GetPersonsAsync(groupId);
            if (persons.Any(p => p.Name == name)) return false;

            CreatePersonResult friend = await faceService.CreatePersonAsync(groupId, name);
            var file = File.OpenRead($@"c:\Bash\{name}.jpg");
            await faceService.AddPersonFaceAsync(groupId, friend.PersonId, file);
            return true;
        }
    }
}

eGandalf's Happy Coding Corner

March 6, 2017

Face-Based Login with Episerver and Microsoft Cognitive Services

Building a Face API-Powered Login Page

No comments: