Geo-routing, A/B testing and dynamic configuration with Lambda@Edge — part 1

Working at the edge is one of the fantastic opportunities offered by Amazon and AWS Lambda is the key component for enhancing our infrastructure on the edge.
More recently many other vendors started to offer similar services like Cloudflare with edge workers for instance, in general many CDNs providers are looking to add a similar service like the AWS one.

Lambda@Edge introduction

Lambda@Edge provides you the possibility to extend the behaviours of a request or response directly on the edge.
This paradigm, in conjunction with the serverless one, can provide a great flexibility in structuring our applications and it can prevent that too many requests hit our application servers executing operations directly on the edge like headers manipulation, access control, redirection rules, images manipulation and so on.
In order to work with Lambda@Edge in AWS we just need to setup a Cloudfront distribution in front of our infrastructure, the Cloudfront distribution will allow us to setup our logic on the edge because we are able to intercept the 4 states of a request and interact with them.
In fact Lambda@Edge can be triggered at:

· viewer request: in this case the lambda is triggered when Cloudfront receives a request from a client
· origin request: the lambda is triggered before Cloudfront forwards the request to origin
· origin response: this state happens after origin replies to a request and Cloudfront receives it
· viewer response: the last state is triggered when Cloudfront forwards the response to the client.

Cloudfront is a global CDN therefore Lambda@Edge is triggered on any edge available across the world.
This means that independently from the region we set our data centre, we can manipulate or serve responses on the edge before they even arrive to our application servers.

I spent some time working on a spike for handling 2 specific features, geo-routing and a/b testing, in the mix I tried also to dynamically retrieve configurations parameters for the Lambda in order to avoid code deployment for every change I needed to do.
I’ll share the results of my spike between 2 posts, this one talks about the initial thoughts, goals and results achieved with Lambda@Edge, the second one will be more technical and I’ll explain how I configure Cloudfront, the Lambda code and setup and the overall setup for interacting with other AWS services on the edge.

Spike goals

Before we start I think is valuable understanding what I wanted to achieve with this spike, the goals are:

1. geo-routing a request to a specific static file stored in S3 bucket based on user country
2. A/B testing different applications serving always maintaining a sticky session per browser so a user always interacts with the same version
3. The previous 2 goals shouldn’t significantly impact the response time, Lambda@Edge has to be executed redirecting the user to the correct static file in 50ms or so
4. “bonus goal” is the possibility to dynamically apply different configurations without redeploying the Lambda code and without impacting too much the response time

Lambda@Edge configuration and limits

All that glitters ain’t gold! Lamba@Edge has several limitations that we need to be aware of before taking it in considerations.
In order to achieve the geo-routing we need to receive from Cloudfront the country viewer header, that will be used to determine where the request is coming from.
This header is received right after Cloudfront forwards the request to origin, we won’t receive at the viewer request state, therefore our Lambda has to be triggered as origin request.
Bear in mind that Cloudfront removes all the headers if not whitelisted for increasing the response cachability, so when we configure our distribution we need to whitelist some specific headers or all of them, depends from our needs.

Lambda@Edge is a particular kind of Lambda with different limitations than the one we are used to work inside an AWS data centre.
First of all the Lambda@Edge has to be created in North Virginia only, we can associate only a numeric released with Cloudfront and not the $latestversion.
When we debug our Lambda@Edge we need to remember that all the logs in Cloudwatch won’t be available in North Virginia only but in the nearest data centre from where the Lambda was executed, for instance I’m based in London therefore during the spike, all the logs on Cloudwatch were available in the London AWS data centre only.
Finally we have also some soft limits to take in consideration like max memory associated to our Lambda@Edge, concurrent executions and so on, you can find them in the image below:

An important thing to remember is how to debug our Lambda@Edge, luckily AWS thought pretty well about this point and they provide a way to simulate an origin request customising the parameters to send it directly from the Lambda console:

Geo-routing

The Geo-routing feature was really easy to achieve considering that Cloudfront provides everything we need out-of-the-box.
After whitelisting the cloudfront-viewer-country header we can receive in our Lambda@Edge the country from where the request was coming from and we can apply the behaviour we need for that specific country.

After configuring Cloudfront distribution properly we can think about describing our behaviour in the Lambda@Edge, in this case I used Node.js for defining the logic:

These are the headers we are going to receive from Cloudfront:

As we can see the implementation is really easy despite the APIs that could have been designed in a more “friendly” way, but I appreciate they are very extensible and flexible allowing the introduction of new features and maintaining retro-compatibility.

A/B testing

For A/B testing the best way to achieve that on the edge is using cookies after select the static file to serve, also in this case we need to configure properly the Cloudfront distribution for forwarding the cookies to the Lambda@Edge:

In this spike for a specific country I was able to redirect a certain percentage of users to a static file and all the others to another one.
This mechanism becomes very handy when we want to test new versions of our web applications, so we can do a canary release and see how the new version behaves compare to the previous one.
Combining A/B testing functionality with the geo-routing feature we can deploy a new artifact in a specific country with a small percentage of users redirected to it, leaving the majority of our users consuming the previous version.
Considering we are using cookies, bear in mind that all the latest version of the browsers allow to store a cookie before doing a redirection but if you are targeting older browsers like IE 10 or similar it’s better to give it a try.
The same concern is valid if you have in your logic multiple redirections in place.

Dynamic Configuration

The bonus feature for the spike was retrieving a configuration for the URLs and the percentage of users we need to redirect on a region basis.
In this case I tried a couple of solutions, the first one with DynamoDB and the second with an S3 bucket where I stored a JSON file.
First of all I need to admit I was surprised that I was able to access a Dynamo table from the edge considering Dynamo is not global like S3 or Cloudfront, so I decided to give it a go.
I structured my table with an ID for each single information I wanted to make dynamic (the URL of an experiment, the percentage of users to redirect to a static file or the other one…) and then I tried 2 approaches, one using scanmethod and the other using get method.
Using scan was slightly slower than using get method but in any case I was over 50ms for Lambda execution therefore DynamoDB wasn’t a viable option for this use case.

I then tried a simple JSON file stored in a S3 bucket, in this case I was able to quickly configure my Lambda retrieving all the parameters I needed for changing URLs or percentage of users redirected to a specific experiment without redeploy the Lambda code.

This could look a small win but you will understand soon that for deploying a new version of a Lambda@Edge we need around 15 minutes before it’s fully propagated across the world.
Retrieving the parameters from an external file allowed to change the key part of the script in a matter of seconds, just the time to make the change in the JSON file and upload on S3.

Considering the operations we usually handle on the edge are very delicates for our infrastructure, having a mechanism to quickly change the behaviour of our logic is fundamental in particular if you are running a B2C service like in my case.

Performances

I didn’t mention in the post any metric because I wanted to gather them in a paragraph for easily compare them, so I created this table related to the executed time in each test I did for achieving the full functionality.
In the table you will see 4 scenarios:
. without any external configuration
. retrieving a JSON file from a S3 bucket
. using DynamoDB with the get method
. using DynamoDB with the scan method

I’d like to add a bit of more context just to make sure you understand how these metrics were retrieved.
DynamoDB was created in North Virginia and the Lambda was running from Europe, I didn’t try yet working with Global Tables but that will be part of a new spike I need to do in the next month, Global Tables could effectively change the execution time of the Lambda but we’ll see.
The execution time for each lambda was gathered from Cloudwatch running each scenario at least 10 times and providing the average you can see in the table.

I leave to you the considerations on the different methods used during the spike.

Useful resources

Following I gathered some useful resources for starting your journey… on the edge

Lambda@Edge limits
Lambda@Edge restrictions
Lambda@Edge pricing
Cloudfront and Lambda@Edge
Routing at the edge tutorial

Wrap up

As we have seen in this post, Lambda@Edge can become very handy for many situations, in particular for alleviating the load of our application servers.
The spike described above is just one of the many possibilities that Lambda provides on the edge, searching on the web you can find interesting tutorials like image manipulations or JWT validation on the edge, this could be just the beginning of a new way to structure our applications providing better performances and easy configurations for our web applications or APIs.
If you are wondering what would be the cost of using Lambda at the edge I leave you with this cost scenario available on the AWS website:

In the next post I’ll present how I set up Cloudfront, S3 and the Lambda code for achieving the goals of my spike, keep an eye on my social accounts 😉