Building a scalable web environment
Building a scalable web environment
Disclaimer: This is not intended to be a technical post. This article serves as a brief overview of how we configured AWS for a client.
When developing a web system you generally have a local web server which contains your database, source files and static elements. The same may be true for your testing and staging environments. This is great for development as it keeps everything in one place and makes development easy.
What happens when you release the system and it starts gaining traction? All of a sudden you’re running into server issues because it can’t keep up with serving the database, the content and servicing requests all at the same time. You can add more horsepower but that’s just delaying the problem.
You need a distributed and scalable system to spread the load evenly across multiple devices. Images and static content should be served from a CDN (Content Delivery Network), databases should be centralised and replicated and web requests should be spread across multiple servers.
Sounds complicated? Well, it is but it’s something that must be done if your system is to keep up with demand. Here we’re going to give an example of a system we recently setup and the technolgies we used to harness the power of distributed hosting.
When deploying a site for our client Oppima (oppima.co.uk) we decided to use AWS (Amazon Web Services) as the platform. This gives us all the tools we need to build a robust, scalable system. Let’s look at how we went from a single server to a distributed system:
We started by configuring Elastic Beanstalk. This is the tool that gives us the ability to have a pool of servers to handle incoming requests. A key feature of this tool is the auto-scaling. This lets us configure some criteria to allow the number of servers to automatically grow and shrink according to demand.
We set the minimum number of servers to be 2 which means we always have 2 lightweight web servers servicing demand. The maximum we chose was 6 as we felt this will be most suitable for current demand. This can of course be increased at any time as demand grows.
We can’t have a separate database on each web server. That would lead to some very inconsistent results for our users. The answer is to use RDS which is a MySQL (and others) compatible database server. The big advantage of RDS is that it can be quickly scaled with no downtime. We also have the luxury of read replicas to greatly improve read performance across the system, meaning we process information faster and more effectively.
Once RDS is configured and we have migrated our data to the new instance we can connect our EC2. Fortunately, through magic (or maybe environment variables), the servers already know how to connect with the database so we’re connected and running against the database.
S3 / CloudFront
Now we’ve got a scalable front-end and a scalable database we have another issue to contend with. The server lets users upload images. The nature of Elastic Beanstalk means that servers get started up and shut down automatically therefore storing images on the server is obviously never going to work. So, the answer to this problem is to store our images in S3 (Simple Storage Service). By offloading all our uploads to S3 all servers now have access to the same files even if they get shut down or recreated.
But why go to the trouble of storing the images on S3 and then pipe them through the web servers? We’re just adding to the load and increasing our costs. CloudFront to the rescue! CloudFront enables us to serve our S3 content via a CDN (Content Delivery Network). This means that data is cached and served from locations near the user which not only improves performance but also reduces page load times for the website. By loading content from CDN we enable the server to serve just the bare minimum to the client. This means our lightweight servers can now handle a far greater number of requests!
Now we have our database, web server and content all optimised and fully scalable however we still have one final issue. Our system uses queues to send out email, push notifications and perform other tasks that shouldn’t keep the user waiting. What happens to these? Either every server is competing to perform the tasks which results in race conditions and duplicate content being sent. Or, as is the reality, nothing gets sent as Elastic Beanstalk does not allow timed tasks (cron jobs) to be run on the server instances.
So, how do we get around this? We setup a worker environment which is very similar to the Elastic Beanstalk environment we configured for the web servers but instead of serving web content, it simply runs the repeated tasks.
By having our web servers push tasks into SQS (Simple Queue Service), we can have these processed offline in our worker environment. This means consistent and reliable processing of emails and other tasks and even more reduced load on our web servers!
By pulling together a raft of technologies we have created a system that not only scales to meet demand automatically but is fault tolerant and able to serve all the requests we can throw at it!
At Fusion, we love AWS and this article has only covered a small number of the tools they provide to businesses. There are many other fantastic services which you can see here https://aws.amazon.com.
If you find your website falling over and being unable to handle demand, get in touch and see if we can help you move to a more suitable platform.