Innova Solutions > Perspectives > Preventing Un-Availability Zones

Innova’s recommendations on setting up your AWS architecture, so your resources are always available

Virtual Private Clouds (VPCs) are the backbone of any organization’s AWS Cloud solution. They mimic a traditional network setup – but without the considerable downsides of a self-managed on-premise data center and while still allowing room for scaling your infrastructure at need. It’s a great way of saving money, staying up to date, and ensuring infrastructure security, which is why moving critical applications and data to the cloud has become a necessity for all modern organizations.

However, there is a drawback: Subnets (that is, the set of addresses that belong to a given group of resources inside a VPC) can only exist inside one Availability Zone and therefore, Region at a time…and AZs inside those regions can sometimes fail, turning an AWS Availability Zone into an UN-availability zone.

Happily, AWS recommends a solution that ensures organizations’ clients, customers, and staff retain access to their vital resources even if the worst happens: architecting for high availability by utilizing multiple AZs in their VPC design.

This article will introduce you to Innova’s recommendations on setting up your AWS architecture, so your resources are always available

Most customer cloud resources (such as virtual machines and databases) are located inside Virtual Private Clouds (VPCs), housed in AWS data centers worldwide: collectively, AWS’ Cloud Infrastructure. AWS Global Cloud infrastructure comprises 25 separate AWS Regions, which are made up of a series of Availability Zones – a collection of data centers geographically close to each other. AWS has 80 in total. That extensive network means that a Region covers a lot of ground. Because a data center can – and does – go down, multi-AZ architecture must be required for any reliable Cloud implementation.

Good to know: a VPC, which is essentially a private virtual cloud network, cannot span Regions. But it CAN and SHOULD travel AZs!

A VPC network’s IP addresses allow the resources to communicate both amongst themselves and with other, external internet-connected resources. Its space is split up into subnets. As previously mentioned, while a VPC can span AZs, a subnet cannot. That is, there is a 1-to-1 relationship between AZ and subnet. That means that if an AZ goes down, the subnets that it encloses are down as well.

Unlike in a traditional data center, where specialized hardware networking devices use technical protocols to enable one IP address to move between multiple endpoints, IPs do not “float” between endpoints in the public cloud. AWS and other public cloud providers do not implement specialized networking protocols to float IPs across endpoints automatically. Instead, cloud architects like Innova attempt to sidestep any potential issues by making as much infrastructure “multi-AZ” as possible. We make sure to build endpoints that are available inside multiple subnets (AZs), and then use a primary, simple method to map an upstream IP to those downstream IP addresses that are inside each AZ.

As a result, when an Availability Zone goes down, making the downstream IP unavailable, the upstream IP mapping service, system, or method recognizes its elusive state. It directs traffic to the AZ or Subnet that is still up, ensuring seamless continuity of service. In the public cloud, Innova’s specific mechanisms of choice for that ‘mapping’ are Amazon’s Route53 DNS Service and Elastic Load Balancing.

How Route53 Can Help

Let’s say you have created a public name in DNS and pointed it at one or both of your AZ/subnet endpoints – and when one goes down.
If you pointed at both, half of the hits would fail because DNS by itself is “dumb” and would route requests endlessly around the backend subnet/AZ endpoint.

If you had two endpoints with half of all requests pointing at each, the half hits the down endpoint and will themselves.
Luckily, Route53 can help solve both of those problems. Healthchecks available in the service can test an endpoint for up status and automatically stop giving out its IP if it’s down. Likewise, you could have a DNS entry that only points to one endpoint (out of two), and if it goes down, you can update it to a point at the endpoint that is still up, either manually or automatically.

How load balancing can help

Loadbalancing is very similar to Route53. However, instead of a service (DNS) providing IP address information in response to queries on a hostname, load balancers offer an IP address to customers and then distribute those requests across many backend endpoints. Like Route53, load balancers provide the ability to implement health checking. Also, because they are literal devices sitting in the traffic communication pathway between the customer and the endpoint, load balancers can manipulate and route data in transit. That’s good news for an endpoint failure event.

What’s the catch? The load balancer is only inside a single AZ. And, like with subnets, if that AZ goes down, so does the load balancer. That’s NOT so good news.

To solve this problem, load-balancers can be made “multi-AZ” as well. In practice, this means that two load balancers are created in two different Availability Zones, which are then “joined at the hip” to be managed as one unit. Of course, in this situation, the two separate-but-joined load balancers still provide two IPs to hit, one at the front of each AZ/subnet.

…and we are seemingly back where we started! So what do we do? We combine both things, Route53 DNS and multi-AZ load balancers, for the complete solution.

A Complete Solution

To ensure complete availability at all times, Innova recommends that organizations set up a Route53 name that resolves to both load-balancers, with Route53 health checks that test if both are working. ​

This way, if one load balancer fails (because its AZ fails), Route53 stops answering with the unavailable load balancer’s address so that requests are never sent to a device that won’t respond. What’s more, the load balancers themselves perform health checks on their actual server endpoints as well: if one of the real servers goes down, the load balancer takes it out of rotation.

That means always-on, always-available, dynamically responding infrastructure for your most essential resources – so your business can stay on track and target.