A load balancer that learns, WebTorch

In my previous blog post “How I stopped worrying and embraced docker microservices” I talked about why Microservices are the bees knees for scaling Machine Learning in production. A fair amount of time has passed (almost a year ago, whoa) and it proved that building Deep Learning pipelines in production is a more complex, multi-aspect problem. Yes, microservices are an amazing tool, both for software reuse, distributed systems design, quick failure and recovery, yada yada. But what seems very obvious now, is that Machine Learning services are very stateful, and statefulness is a problem for horizontal scaling.

Context switching latency

An easy way to deal with this issue is understand that ML models are large, and thus should not be context switched. If a model is started on instance A, you should try to keep it on instance A as long as possible. Nginx Plus comes with support for sticky sessions, which means that requests can always be load balanced on the same upstream a super useful feature. That was 30% of the message of my Nginxconf 2017 talk.

The other 70% of my message was urging people to move AWAY from microservices for Machine Learning. In an extreme example, we announced WebTorch, a full-on Deep Learning stack on top of an HTTP server, running as a single program. For your reference, a Deep Learning stack looks like this.

Pipeline required for Deep Learning in production.
What is this data, why is it so dirty, alright now it’s clean but my Neural net still doesn’t get it, finally it gets it!

Now consider the two extremes in implementing this pipeline;

  1. Every stage is a microservice.
  2. The whole thing is one service.

Both seem equally terrible for different reasons and here I will explain why designing an ML pipeline is a zero-sum problem.

Communication latency

If every stage of the pipeline is a microservice this introduces a huge communication overhead between microservices. This is because very large dataframes which need to be passed between services also need to be

  1. Serialized
  2. Compressed (+ Encrypted)
  3. Queued
  4. Transfered
  5. Dequeued
  6. Decompressed (+ Decrypted)
  7. Deserialized

What a pain, what a terrible thing to spend cycles on. All of these actions need to be repeated every time the microservice limit is crossed. The horror, the terrible end-to-end performance horror!

In the opposite case, you’re writing a monolith which is hard to maintain, probably you’re either using uncomfortable semantics either for writing the HTTP server or the ML part, can’t monitor the in between stages etc. Like I said, writing a ML pipeline for production is a zero-sum problem.

An extreme example; All-in-one deep learning

That’s right, you’ll need to look at your use case and decide where you draw the line. Where does the HTTP server stop and where does the ML back-end start. If only there was a tool that made this decision easy and allowed you to even go to the extreme case of writing a monolith, without sacrificing either HTTP performance (and pretty HTTP server semantics) or ML performance and relevance in the rapid growing Deep Learning market. Now such a tool is here (in alpha) and it’s called WebTorch.

WebTorch is the freak child of the fastest, most stable HTTP server, nginx and the fastest, most relevant Deep Learning framework Torch.

Venn diagram of torch, nginx
Torch and Nginx have one thing in common, the amazing LuaJIT

Now of course that doesn’t mean WebTorch is either the best performance HTTP server and/or the best performing Deep Learning framework, but it’s at least worth a look right? So I run some benchmarks, loaded the XOR neural network found at the torch training page. I used another popular Lua tool, wrk to benchmark my server. I’m sending serialized Torch 2D DoubleTensor tensors to my server using POST requests to train. Here’s the results:

Huzha! Over 1000 req/sec on my Macbook air, with no Cuda support and 2 Intel cores!

So there, plug that into a CUDA machine and see how much performance you squeeze out of that bad baby. I hope I have convinced you that sometimes, mixing two great things CAN lead to something great and that WebTorch is an ambitious and interesting open source project! Check out the Github repo and give it a star if you like the idea.

https://github.com/UnifyID/WebTorch

And hopefully, in due time it will become a fast, production level server which makes it easy for Data Scientists to deploy their models in the cloud (do people still say cloud?) and devOps people to deploy and scale.

Possible applications of such a tool include, but not limited to:

  • Classification of streaming data
  • Adaptive load balancing
  • DDoS attack/intrusion detection
  • Detect and adapt to upstream failures
  • Train and serve NNs
  • Use cuDNN, cuNN and cuTorch inside NGINX
  • Write GPGPU code on NGINX
  • Machine learning NGINX plugins
  • Easily serve GPGPU code
  • Rapid prototyping Deep Learning solutions

Maybe your own?

Docker and Beanstalk: Welcome to the Gaps

At UnifyID we’re big fans of microservices à la Docker and Elastic Beanstalk. And for good reason. Containerization simplifies environment generation, and Beanstalk makes it easy to deploy and scale.

Both promise an easier life for developers, and both deliver, mostly. But as with all simple ideas, things get less, well simple, as the idea becomes more widely adopted, and then adapted into other tools and services with different goals.

Soon there are overlaps in functionality, and gaps in the knowledge base (the Internet) quickly follow. Let’s take an example.

When you first jump into Docker, it makes total sense. You have this utility docker and you write a Dockerfile that describes a system. You then tell docker to read this file and magically, a full blown programming environment is born. Bliss.

But what about running multiple containers? You’ll never be able to do it all with just a single service. Enter docker-compose, a great utility for handling just this. But suddenly, what was so clear before is now less clear:

  • Is the docker-compose.yml supposed to replace the Dockerfile? Complement it?
  • If they’re complementary, do options overlap? (Yes.)
  • If options overlap, which should go where?
  • How do the containers address each other given a specific service? Still localhost? (Not necessarily.)

Add in something like Elastic Beanstalk, its Dockerrun.aws.json file, doing eb local run, and things get even more fun to sort out.

In this post I want to highlight a few places where the answers weren’t so obvious when trying to implement a Flask service with MongoDB.

To start off, it’s a pretty straightforward setup. One container runs Flask and serves HTTP, and a second container serves MongoDB. Both are externally accessible. The MongoDB is password protected, naturally, and in no way am I going to write my passwords down in a config file. They must come from the environment.

Use the Dockerfile just for provisioning

The project began its life with a single Dockerfile containing an ENTRYPOINT to start the app. This was fine while I was still in the early stages of development — I was still mocking out parts of external functionality, or not even handling it yet.

But then I needed the same setup to provide a development environment with actual external services running, and the ENTRYPOINT in the Dockerfile became problematic. And then I realized — you don’t need it in the Dockerfile, so ditch it. Let the Dockerfile do all the provisioning, and specify your entrypoint in one of the other ways. From the command line:

docker run --entrypoint make myserver run-tests

Or, from your docker-compose.yml you can do it like

version: '2'
services:
  myserver:
    ...
    entrypoint: make dev-env

This handily solved the problem of having a single environment oriented to different needs, i.e. test runs and a live development environment.

Don’t be afraid of multiple Dockerfiles

The docker command looks locally for a file named Dockerfile. But this is just the default behavior, and it’s pretty common to have slightly different configs for an environment. E.g. our dev and production environments are very similar, but we have some extra stuff in dev that we want to weed out for production.

You can easily specify the Dockerfile you want by using docker -f Dockerfile.dev ..., or by simply using a link: ln -s Dockerfile.dev Dockerfile && docker ...

If your docker-compose.yml specifies multiple containers you may find yourself in the situation where you not only have multiple Dockerfiles for a given service, but Dockerfile(s) for each service. To demonstrate, let’s say we have the following docker-compose.yml

version: '2'
services:
  flask:
    build: .
    image: myserver:prod
    volumes:
      - .:/app
    links:
      - mongodb
    environment:
      - MONGO_USER=${MONGO_USER}
      - MONGO_PASS=${MONGO_PASS}
    ports:
      - '80:5000'
    entrypoint: make run-server
  mongodb:
    build: ./docker/mongo
    image: myserver:mongo
    environment:
      - MONGO_USER=${MONGO_USER}
      - MONGO_PASS=${MONGO_PASS}
    ports:
      - '27017:27017'
    volumes:
      - ./mongo-data:/data/mongo
    entrypoint: bash /tmp/init.sh

In the source tree for the above, we have Dockerfiles in the following locations:

Dockerfile.dev
Dockerfile.prod
docker/mongo/Dockerfile

The docker-compose command uses the build option to tell it where to find the Dockerfile for a given service. The top two files are for the Flask service, and the appropriate Dockerfile is chosen using the linking strategy mentioned above. The mongodb service uses its own Dockerfile kept in a certain folder. The line

build: ./docker/mongo

tells docker where to look for it.

Dockerrun.aws.json, the same, but different

Enter Elastic Beanstalk and Dockerrun.aws.json. Now you have yet another file, and it pretty much duplicates docker-compose.yml — but of course with its own personality.

You use Dockerrun.aws.json v2 to deploy multiple containers to Elastic Beanstalk. Also, when you do eb local run, the file .elasticbeanstalk/docker-compose.yml is generated from it.

Here’s what the Dockerrun.aws.json corollary of the above docker-compose.yml file looks like:

{
  "AWSEBDockerrunVersion": 2,
  "volumes": [
    {
      "name": "mongo-data",
      "host": {
        "sourcePath": "/var/app/mongo-data"
      }
    }
  ],
  "containerDefinitions": [
    {
      "name": "myserver",
      "image": "SOME-ECS-REPOSITORY.amazonaws.com/myserver:latest",
      "environment": [
          {
            "name": "MONGO_USER",
            "value": "changemeuser"
          },
          {
            "name": "MONGO_PASS",
            "value": "changemepass"
          },
          {
            "name": "MONGO_SERVER",
            "value": "mongo-server"
          }
      ],
      "portMappings": [
        {
          "hostPort": 80,
          "containerPort": 5000
        }
      ],
      "links": [
        "mongo-server"
      ],
      "command": [
        "make", "run-server-prod"
      ]
    },
    {
      "name": "mongo-server",
      "image": "SOME-ECS-REPOSITORY.amazonaws.com/mongo-server:latest",
      "environment": [
          {
            "name": "MONGO_USER",
            "value": "changemeuser"
          },
          {
            "name": "MONGO_PASS",
            "value": "changemepass"
          }
      ],
      "mountPoints": [
        {
          "sourceVolume": "mongo-data",
          "containerPath": "/data/mongo"
        }
      ],
      "portMappings": [
        {
          "hostPort": 27017,
          "containerPort": 27017
        }
      ],
      "command": [
        "/bin/bash", "/tmp/init.sh"
      ]
    }
  ]
}

Let’s highlight a few things. First, you’ll see that the image option is different, i.e.

      "image": "SOME-ECS-REPOSITORY.amazonaws.com/myserver:latest",

This is because we build our docker images and push them to a private repository on Amazon ECS. On deploy, Beanstalk looks for the one tagged latest, pulls, and launches.

Next, you may have noticed that in docker-compose.yml we have the entrypoint option to start the servers. However, in Dockerrun.aws.json we’re using "command".

There are some subtle differences between ENTRYPOINT and CMD. But in this case, it’s even simpler. Even though Dockerrun.aws.json has an "entryPoint" option, the server commands wouldn’t run. I had to switch to "command" before I could get eb local run to work. Shrug.

Another thing to notice is that in docker-compose.yml we’re getting variables from the host environment and setting them into the container environment:

    environment:
      - MONGO_USER=${MONGO_USER}
      - MONGO_PASS=${MONGO_PASS}

Very convenient. However, you can’t do this with Dockerrun.aws.json. You’ll have to rewrite the file with the appropriate values, then reset it. The next bit will demonstrate this.

We’re setting a local volume for MongoDB with the following block:

  "volumes": [
    {
      "name": "mongo-data",
      "host": {
        "sourcePath": "/var/app/mongo-data"
      }
    }
  ]

The above path is production specific. This causes a problem with eb local run, mainly because of permissions on your host machine. If you set a relative path, i.e.

        "sourcePath": "mongo-data"

the volume is created under .elasticbeanstalk/mongo-data, and everything works fine. On a system with Bash, you can solve this pretty easily doing something along the following lines:

cp Dockerrun.aws.json Dockerrun.aws.json.BAK
sed -i '' "s/\/var\/app\///g" Dockerrun.aws.json
eb local run ; mv Dockerrun.aws.json.BAK Dockerrun.aws.json

We just delete the /var/app/ part, run the container locally, and return the file back to how it’s supposed to be for deploys. This is also how we set the password — changemepass — from the environment on deploy.

Last, you’d think running eb local run, which is designed to simulate an Elastic Beanstalk environment locally via Docker, would execute pretty much the same as when you invoke with docker-compose up.

However, I discovered one frustrating gotcha. In our Flask configuration, we are addressing the MongoDB server with mongodb://mongodb (instead of mongodb://localhost) in order to make the connection work between containers.

This simply did not work in eb local run. Neither did using localhost. It turns out the solution is to use another environment variable, MONGO_SERVER. In our Flask config, we do the following, which defaults to mongodb://mongodb:

    'MONGO_SERVER': os.environ.get('MONGO_SERVER', 'mongodb'),

In Dockerrun.aws.json, we specify this value as

          {
            "name": "MONGO_SERVER",
            "value": "mongo-server"
          }

Why? Because the "name" of our container is mongo-server and eb generates an entry in /etc/hosts based on that.

So now everything works between docker-compose up, which uses mongodb://mongodb, and eb local run, which uses mongodb://mongo-server.

These are just a few of the things that might confound you when trying to do more than just the basics with Docker and Elastic Beanstalk. Both have a lot to offer, and you should definitely jump in if you haven’t already. Just watch out for the gaps!