Is serverless ready to use for ML? – The Cloud Report | News, articles, interviews and tests

I work almost entirely with serverless applications, and one of the most common questions that comes up is what applications serverless is good for, and what it is less well suited to. Classically, when asking ‘what can’t serverless do?,’ a ready answer was ‘machine learning’ since the immediate constraints on serverless compute seemed to preclude ML. However, shifts in thinking and in the available tool set for AWS Lambda have meant that some ML tasks now make more sense for serverless.

To clarify, this piece will focus on AWS and its serverless compute offering AWS Lambda. The reality of the serverless landscape is that AWS is dominant in market share and the maturity of its product offering. It is also the area of my expertise; and the suitability of serverless for ML is dependent on specific product offerings from AWS.

Why ML on Lambdas sounded like a bad idea

Several key issues limited Lambdas usefulness:

Limited code package size: There is a hard limit of 50MB for compressed deployment package with AWS Lambda and an uncompressed AWS Lambda hard limit of 250MB. This kept projects which relied on massive third-party libraries from deploying at all.

Maximum run time: Pre 2018, the maximum time a Lambda function could run was 5 minutes. This limitation, like the others on this list, makes a ton of sense if you are using Lambdas to handle web requests. If the end user has been waiting 4 and a half minutes for a response, it’s likely they’ve given up in frustration, and since Lambdas are billed by how long they run, a task that unexpectedly runs for hours becomes quite expensive!For engineers trying to build a model, this runtime limitation was comically low, and even its current limit of 15 minutes is hard to work with.
Payload Limits: The incoming data (payload) to a Lambda is 6MB, meaning that it is not that easy to pass large in-process files between lambdas. So, passing around a file in process is not a viable way to work around the run time limit.

AWS has now since addressed these problems:

The three concerns above have been addressed by recent changes to AWS Lambda

Limited package size? Try Layers.

Lambda Layers allow you to distribute extra modules or even a wrapper for your current functions as part of a ‘layer’ that does not count to your lambdas package size. Your deployment can remain a slim package that is easy to upload and manage but can still bring in big libraries.
Limited run time? Use Lambda with ML-specific products
- While the maximum run time on Lambdas was increased to 15 minutes, you still may need hours while building models. Rather than tricking AWS into letting your Lambda run for a week, AWS’s ML-specific tools like Sagemaker can be used with Lambdas as the routing/executive layer. Lambda can direct text input to AWS Comprehend for building a language model.

Limited payloads? Shared Storage is Key
- Handing files directly to other Lambdas as part of a payload is a non-starter with a 6mb payload size limit, even coming close to the limit with compressed images will be a frustrating experience as header data adds size and causes requests to sometimes fail. Much more effective is using shared storage. When processing direct user input, the standard solution would be to have the user upload to S3, then pass an object reference to your Lambda function. For files passed between two lambdas, the recently released Lambda and Elastic File System (EFS) integration is a great tool.

Worried about costs? Provision instances
- For tasks with large and predictable requirements the advent of provisioned concurrency opens up new possibilities: while provisioned instances of an AWS Lambda have the advantage of not suffering cold starts (instances where the spinning up of a new virtual execution instance causes a slower response with the first request) provisioned instances are also cheaper than on-demand instances by nearly 50%, further lowering the cost of Lambda computation.

ML on Serverless is Possible

Those who are comfortable running ML on bare metal servers will find cloud environments challenging, and the stateless, limited world of serverless even more so. But for those already using cloud-hosted virtual machines, serverless offers a scalable and cost-efficient choice for building new models.

Traditional ML tasks often involved contracting for significant expenditure on hardware just to get started, but the serverless world allows engineers to pay for only the precise amount of computation that they need.

Another key realization for ML on serverless is the fact that serverless on AWS is much more just Lambda compute functions. The first-ever AWS service, S3, is serverless. After all it’s not like you ever configure your storage server with s3!

In the end the definition of serverless is not a particular technology or product, it’s a design goal that says ‘let the vendor handle the arbitrary heavy lifting of the technology base layer, while I focus on the things that my customers want and differentiate our business.’ For AWS Lambda that means not handling the traditional configuration of a server. For database tools it means not handling database server setup, authentication, or the network layer. But with that definition in mind, it’s clear that many machine-learning focused businesses don’t really want to increase their competence in starting up and running servers. As such, serverless should be a part of any modern application stack.

Author:

Nočnica Fee

Nočnica Fee is a Dev Advocate for New Relic specializing in serverless. She’s a frequent contributor to The New Stack and Dev.to. In her downtime, she enjoys drum machines and woodworking.