A load balancer that learns, WebTorch

In my previous blog post “How I stopped worrying and embraced docker microservices” I talked about why Microservices are the bees knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa) and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are an amazing tool, both for software reuse, distributed systems design, quick failure and recovery, yada yada. But what seems very obvious now, is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling.

Context switching latency

An easy way to deal with this issue is understand that ML models are large, and thus should not be context switched. If a model is started on instance A, you should try to keep it on instance A as long as possible. Nginx Plus comes with support for sticky sessions, which means that requests can always be load balanced on the same upstream a super useful feature. That was 30% of the message of my Nginxconf 2017 talk.

The other 70% of my message was urging people to move AWAY from microservices for Machine Learning. In an extreme example, we announced WebTorch, a full-on Deep Learning stack on top of an HTTP server, running as a single program. For your reference, a Deep Learning stack looks like this.

Pipeline required for Deep Learning in production.
What is this data, why is it so dirty, alright now it’s clean but my Neural net still doesn’t get it, finally it gets it!

Now consider the two extremes in implementing this pipeline;

  1. Every stage is a microservice.
  2. The whole thing is one service.

Both seem equally terrible for different reasons and here I will explain why designing an ML pipeline is a zero-sum problem.

Communication latency

If every stage of the pipeline is a microservice this introduces a huge communication overhead between microservices. This is because very large dataframes which need to be passed between services also need to be

  1. Serialized
  2. Compressed (+ Encrypted)
  3. Queued
  4. Transfered
  5. Dequeued
  6. Decompressed (+ Decrypted)
  7. Deserialized

What a pain, what a terrible thing to spend cycles on. All of these actions need to be repeated every time the microservice limit is crossed. The horror, the terrible end-to-end performance horror!

In the opposite case, you’re writing a monolith which is hard to maintain, probably you’re either using uncomfortable semantics either for writing the HTTP server or the ML part, can’t monitor the in between stages etc. Like I said, writing a ML pipeline for production is a zero-sum problem.

An extreme example; All-in-one deep learning

Venn diagram of torch, nginx
Torch and Nginx have one thing in common, the amazing LuaJIT
That’s right, you’ll need to look at your use case and decide where you draw the line. Where does the HTTP server stop and where does the ML back-end start. If only there was a tool that made this decision easy and allowed you to even go to the extreme case of writing a monolith, without sacrificing either HTTP performance (and pretty HTTP server semantics) or ML performance and relevance in the rapid growing Deep Learning market. Now such a tool is here (in alpha) and it’s called WebTorch.

WebTorch is the freak child of the fastest, most stable HTTP server, nginx and the fastest, most relevant Deep Learning framework Torch.

Now of course that doesn’t mean WebTorch is either the best performance HTTP server and/or the best performing Deep Learning framework, but it’s at least worth a look right? So I run some benchmarks, loaded the XOR neural network found at the torch training page. I used another popular Lua tool, wrk to benchmark my server. I’m sending serialized Torch 2D DoubleTensor tensors to my server using POST requests to train. Here’s the results:

Huzha! Over 1000 req/sec on my Macbook air, with no Cuda support and 2 Intel cores!

So there, plug that into a CUDA machine and see how much performance you squeeze out of that bad baby. I hope I have convinced you that sometimes, mixing two great things CAN lead to something great and that WebTorch is an ambitious and interesting open source project! Check out the Github repo and give it a star if you like the idea.

https://github.com/UnifyID/WebTorch

And hopefully, in due time it will become a fast, production level server which makes it easy for Data Scientists to deploy their models in the cloud (do people still say cloud?) and devOps people to deploy and scale.

Possible applications of such a tool include, but not limited to:

  • Classification of streaming data
  • Adaptive load balancing
  • DDoS attack/intrusion detection
  • Detect and adapt to upstream failures
  • Train and serve NNs
  • Use cuDNN, cuNN and cuTorch inside NGINX
  • Write GPGPU code on NGINX
  • Machine learning NGINX plugins
  • Easily serve GPGPU code
  • Rapid prototyping Deep Learning solutions

Maybe your own?

Credential Stuffing; How PRC almost hacked my Steam

Recently we’ve witnessed some pretty big password leaks. First 6.4m unsalted passwords leaked from LinkedIn, then 500m passwords leaked from Yahoo, which today turned to 1 billion accounts. This is truly scary even if you haven’t been using your Yahoo account. To see why let us go back a couple of months when I almost fell victim to a credential stuffing attack from China.

First of all, “Credential stuffing” is a fancy name for password reuse. All it takes is somebody with very intermediate computer security knowledge, looking up the password dumps from Yahoo or LinkedIn (widely available), then trying the same exact credentials on as many different sites as possible, until there is a match. In my case, I logged into my Steam account and saw something like this:

screen-shot-2016-09-29-at-1-48-50-pm
Steam, how it looks like when you have been hacked

Unfortunately, Steam does not specify if this is a credentials stuffing attempt, but it was only a week after the big LinkedIn leak. I may also have been reusing the same password for my LinkedIn and Steam, so all the pieces fit. Steam was very helpful in telling me the following:

  1. Somebody had tried to access my account from PRC.
  2. He had both my username and password.
  3. His attempt was blocked since I’ve never accessed Steam from PRC.
  4. I needed to change my password to regain access to my account.

At that time I deeply appreciated all those otherwise annoying security features. Facebook asking me to identify my friends, Google sending me text messages and now Steam using geolocation to see where my impersonator lives. I quickly updated my password on Steam and 5-6 other websites.

My new password was the same as the old one, with the last letter changed from a ‘d’ to an ‘e’, meaning this was the 5th time I updated my Steam password for one or another reason. The rest of the password was pretty good in terms of entropy. Caps, lower cases, numbers, and symbols, randomly generated as well as pronounceable, using pwgen, a great CLI tool for generating strong, memorable passwords.

screen-shot-2016-09-29-at-3-30-06-pm
pwgen producing secure, memorable passwords

But this is not great overall. It’s only one step in the right direction for attackers to realize how hard it is to remember a password, which is why users opt to postfix their existing ones with predictable components, such as an increasing identifier. I’ve read posts about people using the same password everywhere and instead prefixing it with the site name. So if your main password is “d34db33f” then for Amazon it will be “amazon_d3adb33f”, for Chase “chase_d3adb33f” or something along those lines.

I believe I have a good understanding of the security concepts behind passwords and I think I’m doing better in terms of passwords that 99% of the people out there since my password is not “password” or “123456” (proof). On the other hand, here I found myself coming up with predictable password patterns. Then it came to me, the bigger issue exposed by credential stuffing attacks and password reuse:

Either we all do passwords right, or nobody does.

Either nobody gets hacked, or we may as well all be, as long as users can’t help but use the same passwords and predictable patterns over and over again.

So what does it mean for everyone to do passwords right? If you want to be really safe, you’ve got to be a bit paranoid and lean completely on the side of security versus convenience.

  1. A password should be completely unpredictable (should not include pet names, date of birth, middle names, children names, childhood heroes, favorite books, in fact, no English words at all).
  2. A password should have capital letters, lowercase letters, numbers, symbols and be at least 16 characters long (for 128-bit keys).
  3. A different such password for each website, changed every 3 months, with no logical correlations between them.

It is indeed impossible to be truly secure using passwords. How about password managers then? Letting them handle the complexity of passwords. Not a bad idea on first thought. Just tie all passwords to the user’s machine. But then you get this:

4e5cfd3a8e031
The problem with password managers, you’re not your laptop

Password managers basically escalate the problem of cyber security to a problem of physical security of your devices. If I can get my hands on an open laptop, I can access pretty much any website, as long cookies are enabled or a password manager has been used. And that’s pretty terrible.

In the end, there is no solution that takes care of every aspect of identity security today. It’s either what you know (password), what you have (device) and now we’re finally moving into the age of what you are.

Photo
TechCrunch Disrupt 2016, we won runner-up in Disrupt Battlefield.

At UnifyID, we think of the human as the central point of identity management. Think about every bit of information that makes you, You. How large is your stride, do you walk fast or slow, how long are your arms, which floor is your house at, how fast do you drive to work? This is all information that we feed into our machine learning system as input. The output is binary. Either it is you, or it isn’t. Since we only require 1-bit of information at the time of authentication, we can log you in with one click.

screenshot_2016-12-14_15-09-09
One-click secure login with UnifyID

Our system works with existing password infrastructures. We generate a large, random password for every website you are logged in, and secure it with You as the key. In fact, we don’t even need to know that password. Part of it stays with you and part of it lives in our servers. This way, even if your devices get stolen, even if we get hacked, you’re safe. There’s no single point of failure in the UnifyID system.

In addition, UnifyID works across devices. Your computer knows about your phone, and they share the same credentials. Remember that time when you left your laptop unattended for 5′ and your facebook wall got full of questionable posts? Not anymore. We can detect when you stand up and walk away. In fact, we can do that for every website, banks, e-shops, federal websites. Take your identity with you when you leave the room.

Here at UnifyID we take your security seriously. Passwords are an inconvenience and they will soon go the way of the floppy drive. Machine learning and implicit authentication can help you, and we know exactly how. Sign up for our private beta!

How I stopped worrying and embraced Docker Microservices

Hello world,

If you are like us here at UnifyID then you’re really passionate about programming, programming languages and their runtimes. You will argue passionately about how Erlang has the best Distributed Systems model (2M TCP connections in one box), Haskell has the best type system, and how all our ML back-end should be written in Lua (Torch). If you are like me and you start a company with other people, you will argue for hours, and nobody’s feelings are gonna be left intact.

That was the first problem we had in the design phase of our Machine Learning back-end. The second problem will become obvious when you get a short introduction to what we do at UnifyID:

We data-mine a lot of sensors on your phone, do some signal processing and encryption on the phone, then opportunistically  send the data from everybody’s phone into our Deep-Learning backend where the rest of the processing and actual authentication take place.

This way, the processing load is shared between the mobile device and our Deep Learning backend. Multiple GPU machines power our Deep Learning, running our proprietary Machine Learning algorithms, across all of users’ data.

These are expensive machines and we’re a startup with finite money, so here’s the second problem; Scalability. We don’t want these machines sitting around when no jobs are scheduled and we also don’t want them struggling when a traffic spike hits. This is a classic auto-scaling problem.

This post describes how we killed two birds;

  1. Many programming runtimes for DL.
  2. Many machines.

With one stone. By utilizing the sweeping force of Docker microservices! This has been the next big thing in distributed systems for a while, Twitter and Netflix use this heavily and this talk is a great place to start. Since we have a lot of factors we verify against like Facial Recognition, Gait Analysis and Keystroke Analysis, it made sense to make them modular. We packaged each one in its own container, wrote a small HTTP server that satisfies the following REST API and done!

POST /train
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { jobId: <jobId> }
POST /input
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { jobId: <jobId> }
POST /output
Body: { Files: [ <s3 file1>, <s3 file2>,...] }
Response: { outputVector: <output vector> } 

GET /status?jobId=<jobId> 
Response: { status: [running|done|error] }

This API can be useful because every Machine Learning algorithm has pretty much the same API; training inputs, normal inputs and outputs.  It’s so useful we decided to open-source our microservice wrapper for Torch v7/Lua and for Python. Hopefully more people can use it and we can all start forking and pushing entire machine learning services in dockerhub.

But wait, there’s more! Now that we containerized our ML code, the scalability problem has moved from a development problem to an infrastructure problem. To handle scaling each microservice according to their GPU and Network usage, we rely on Amazon ECS. We looked into Kubernetes, as a way to load-balance containers, however its support for NVIDIA GPU based load-balancing is not there yet (There’s a MR and some people who claim they made it work). Mesos was the other alternative, with NVIDIA support, but we just didn’t like all the Java.

In the end, this is how our ML infrastructure looks like.

screen-shot-2016-09-16-at-8-54-16-pm
Top-down approach to scalable ML microservices

Those EB trapezoids represent Amazon EB (Elastic Beanstalk), another Amazon service which can replicate machines (even GPU heavy machines!) using custom-set rules. The inspiration for load-balancing our GPU cluster with ECS and EB came from this article from Amazon’s Personalization team.

For our Database we use a mix of Amazon S3 and a traditional PostgreSQL database linked and used as a local cache for each container. This way, shared data becomes as easy as sharing S3 paths, while each container can modularly keep its own state in PostgreSQL.

So there you have it, both birds killed. Our ML people are happy since they can write in whatever runtime they want as long as there is an HTTP server library for it. We don’t really worry about scalability as all our services are small and nicely containerized. We’re ready to scale to as many as 100,000 users and I doubt our microservices fleet would even flinch. We’ll be presenting our setup in the coming Dockercon 2017 (hopefully, waiting for the CFP to open) and we’re looking to hire new ML and full-stack engineers. Come help us bring the vision of passwordless implicit authentication to everyone!