How I stopped worrying and embraced Docker Microservices

Hello world,

If you are like us here at UnifyID then you’re really passionate about programming, programming languages and their runtimes. You will argue passionately about how Erlang has the best Distributed Systems model (2M TCP connections in one box), Haskell has the best type system, and how all our ML back-end should be written in Lua (Torch). If you are like me and you start a company with other people, you will argue for hours, and nobody’s feelings are gonna be left intact.

That was the first problem we had in the design phase of our Machine Learning back-end. The second problem will become obvious when you get a short introduction to what we do at UnifyID:

We data-mine a lot of sensors on your phone, do some signal processing and encryption on the phone, then opportunistically  send the data from everybody’s phone into our Deep-Learning backend where the rest of the processing and actual authentication take place.

This way, the processing load is shared between the mobile device and our Deep Learning backend. Multiple GPU machines power our Deep Learning, running our proprietary Machine Learning algorithms, across all of users’ data.

These are expensive machines and we’re a startup with finite money, so here’s the second problem; Scalability. We don’t want these machines sitting around when no jobs are scheduled and we also don’t want them struggling when a traffic spike hits. This is a classic auto-scaling problem.

This post describes how we killed two birds;

  1. Many programming runtimes for DL.
  2. Many machines.

With one stone. By utilizing the sweeping force of Docker microservices! This has been the next big thing in distributed systems for a while, Twitter and Netflix use this heavily and this talk is a great place to start. Since we have a lot of factors we verify against like Facial Recognition, Gait Analysis and Keystroke Analysis, it made sense to make them modular. We packaged each one in its own container, wrote a small HTTP server that satisfies the following REST API and done!

POST /train
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { jobId: <jobId> }
POST /input
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { jobId: <jobId> }
POST /output
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { outputVector: <output vector> } 

GET /status?jobId=<jobId> 
Response: { status: [running|done|error] }

This API can be useful because every Machine Learning algorithm has pretty much the same API; training inputs, normal inputs and outputs.  It’s so useful we decided to open-source our microservice wrapper for Torch v7/Lua and for Python. Hopefully more people can use it and we can all start forking and pushing entire machine learning services in dockerhub.

But wait, there’s more! Now that we containerized our ML code, the scalability problem has moved from a development problem to an infrastructure problem. To handle scaling each microservice according to their GPU and Network usage, we rely on Amazon ECS. We looked into Kubernetes, as a way to load-balance containers, however its support for NVIDIA GPU based load-balancing is not there yet (There’s a MR and some people who claim they made it work). Mesos was the other alternative, with NVIDIA support, but we just didn’t like all the Java.

In the end, this is how our ML infrastructure looks like.

Top-down approach to scalable ML microservices

Those EB trapezoids represent Amazon EB (Elastic Beanstalk), another Amazon service which can replicate machines (even GPU heavy machines!) using custom-set rules. The inspiration for load-balancing our GPU cluster with ECS and EB came from this article from Amazon’s Personalization team.

For our Database we use a mix of Amazon S3 and a traditional PostgreSQL database linked and used as a local cache for each container. This way, shared data becomes as easy as sharing S3 paths, while each container can modularly keep its own state in PostgreSQL.

So there you have it, both birds killed. Our ML people are happy since they can write in whatever runtime they want as long as there is an HTTP server library for it. We don’t really worry about scalability as all our services are small and nicely containerized. We’re ready to scale to as many as 100,000 users and I doubt our microservices fleet would even flinch. We’ll be presenting our setup in the coming Dockercon 2017 (hopefully, waiting for the CFP to open) and we’re looking to hire new ML and full-stack engineers. Come help us bring the vision of passwordless implicit authentication to everyone!

Distributed Testing with Selenium & Docker Containers (Part I)

As seen at TechCrunch Disrupt, the UnifyID Google Chrome extension plays a prominent place in the user’s first impression with the product. As such, it is critical that the experience be smooth, user-friendly, and most important, reliable. In minute 2:10 of the demo, the Amazon login fields are replaced with the UnifyID 1-click login. This requires DOM (document object model) manipulation on the web page which isn’t challenging on a single website; however, when scaling your code across sites in multiple languages and formats, finding the right login form and place to insert our logic is challenging.

How to find the right elements in here?
How to find the right elements in here and generalize well?

Parsing through the noise of variations on HTML structures, CSS naming conventions, and general web development is like the wild west–but despite this, there are ways to gain sanity, namely in testing.

As most UIs are tested, we utilize Selenium and its Javascript binding. For developers who have had some experience with Selenium, it becomes painfully obvious very early on that speed is not its forte. The testing environment must handle testing functionality across dozens, hundreds, and thousands of sites while maintaining the fast, iterative nature of development. If every test case takes an hour to finish, then our continuous integration would lose its magical properties (you know, moving fast without breaking things).

Surfing at speed through the codebase.
Surfing at speed through the codebase, knowing everything will be ok.

The solution to distribute testing across multiple computers will require the following:

  1. Must run multiple instances of the tests with the same code but on different websites.
  2. Must run all of the tests in parallel.
  3. Must collect all results.

In order to simplify my work, I discovered that Selenium has this neat feature called Selenium Grid that allows you to control testing in a remote machine so that the local machine doesn’t have to run the tests itself. You just run the grid in another machine, expose the port, and remote connect to that port by using settings found in your chromedriver.

This was amazing! This meant that we could have machines entirely dedicated to running a bunch of Selenium webdrivers (which are considerably RAM consuming) without killing our CI server. Plus, we could easily scale by just spawning more of these machines and more of these web drivers.

Finally! Scalability!

As a result, the design evolved to this:

  1. Set up a Selenium server to run a grid of browsers. Run Chrome.
    1. Package under a Docker image for easy deployment.
    2. Run under an Amazon AWS instance for easy scaling.
  2. The CI server has to be able to send requests to the Grid.
  3. The tests have to run in parallel.
    1. Create a Javascript library to make tests into chunks that can be sent over to the server.
  4. Finally, the results have to be merged together and sent back to the CI server.

For 1, I downloaded selenium-server-standalone-2.53.1.jar from “” and saved as selenium-server-standalone.jar.

Run hub with:

java -jar selenium-server-standalone.jar\
-role hub

Run node with:

java -jar selenium-server-standalone.jar -role node\
-hub http://localhost:4444/grid/register\
-browser browserName=chrome,version=52.0,\

Now to connect to the grid, we can run from JavaScript:

Look out for my next post which will delve deeper into parts 2, 3, and 4 above.

Thanks for Hacking!