AWS RDS Replication and Recoverability

Intro

This article provides a walkthrough on enhancing your AWS RDS setup’s resilience.

We’ll go through the following steps:

  • Set up an MySQL RDS instance and its replica in separate AWS regions.
  • Use AWS CloudFormation for the initial networking setup, then create two EC2 instances in the two regions to interact with the databases.
  • Monitor the replication lag using CloudWatch and configure alarms for email notifications via SNS.
  • Finally, we’ll promote the replica to a standalone instance.

This project was completed as part of the AWS Cloud Architect nanodegree at Udacity.

Exploring the CloudFormation Template

AWS CloudFormation allows you to describe and provision all your AWS resources in your environment using a template.

For this demo, I’ve prepared a CF template for creating the required networking setup. We’ll apply this template to both regions where we spin up the RDS instances.

Feel free to explore the CF template in detail on GitHub, but here’s a brief overview of the core elements:

  • VPC – to create the RDS and EC2 instances in
  • Public Subnets, making the EC2 instances accessible publicly
  • Private Subnets for the RDS instances
  • Application Security Group – allowing SSH access to the EC2 instances
  • Database Security Group – allowing inbound traffic to the database from the EC2 instances

Loading the CloudFormation Template

The next step is to create the CloudFormation stacks, applying the template to both regions. For this demo, I’ll use the AWS Console. Let’s see how to do that for the primary region.

Open your AWS console, go to Cloud Formation, click Create Stack, and select the CF template file:

This is where you’ll fill in the parameter values defined in your template. For the VPC in the primary region, you might use values like:

  • VpcName: RDSRecPrimayVPC
  • VpcCIDR: 10.1.0.0/16
  • PublicSubnet1CIDR: 10.1.10.0/24
  • PublicSubnet2CIDR: 10.1.11.0/24
  • PrivateSubnet1CIDR: 10.1.20.0/24
  • PrivateSubnet2CIDR: 10.1.21.0/24

Once all the parameters are filled in, review your settings and click “Create Stack.” Your VPC, along with the other resources specified in the template, will now be created.

Now, you need to perform the same steps for the secondary region. It’s crucial to make sure the CIDR blocks of the subnets don’t overlap. For example, you can use such values:

  • VpcName: RDSRecSecondaryVPC
  • VpcCIDR: 10.2.0.0/16
  • PublicSubnet1CIDR: 10.2.10.0/24
  • PublicSubnet2CIDR: 10.2.11.0/24
  • PrivateSubnet1CIDR: 10.2.20.0/24
  • PrivateSubnet2CIDR: 10.2.21.0/24

Creating the Primary and Replica RDS Instances

Before creating each of the RDS instances, you’ll need to set up an RDS subnet group in each region. This subnet group needs to contain the private subnets that were created via the CF template.

Here’s a sample configuration for the subnet group in the primary region:

Setting Up the Primary MySQL RDS Instance

We can now proceed to create our primary RDS instance.

In the AWS RDS Console, click on “Create database” and choose MySQL as the database engine.

Fill in the necessary details like instance configuration, master username, and master password:

For the demo environment, you can select to create a single instance (not a multi-AZ setup), burstable classes, and t3.micro instance type.

Under the networking options, select the appropriate VPC and RDS Subnet Group you just created. You also need to select the database security group.

Confirm the other settings and click “Create database.”

Replicating the Primary RDS to the Standby Region

Once the primary DB instance is created, select it in the RDS Console, and from the Actions dropdown, click “Create read replica”:

You need to populate the required parameters similarly to when we were creating the primary instance:

You must also pick the secondary region:

Make sure you properly feel the connectivity options:

Accessing the Primary DB and Standby Replica through EC2 Instances

In this section, we’ll verify the types of database operations we can perform on both the primary and replica RDS instances. Specifically, we’ll use EC2 instances as clients to confirm that writing (inserting) and reading data is possible on the primary instance, but only reading is allowed on the standby instance in the secondary region. We’ll assume that EC2 hosts are already launched in the public subnets.

I won’t go into the details of spinning up the EC2 instances and ssh-ing into them. There are plenty of guides to follow, and it’s not the focus of the current demo. You can use any AMI image you prefer and any means of connecting to it – e.g. SSH via a KMS key or directly using the AWS EC2 console. The important thing is you’ll need to install MySQL client on both EC2 instances to connect to the primary and replica MySQL servers. You must also select the RDSRec-Application security group, created as part of the CF stack, that gives you access to the DB instances from the EC2 machines.

Inserting Data to the Primary RDS Instance

Let’s connect to the EC2 instance in the primary region and create a database, EmployeeDB. Then, we’ll create a table, EmployeeDetails and insert a record.

Reading Data from the Replica

Let’s confirm the data replication by showing the data in the replica and then try to insert a record to confirm the database is read-only.

Monitoring Replication Lag and Setting Up CloudWatch Alarms

Replication lag is the delay between an event occurring in the primary database and that event being reflected in the read replica. As you scale your application, monitoring replication lag becomes critical to ensure data consistency and application performance.

Setting Up a CloudWatch Alarm for Replication Lag

AWS CloudWatch allows you to set alarms for various metrics, including RDS replication lag. Follow these steps to set up a CloudWatch alarm.

In the region with the replica database, open the AWS CloudWatch console and click “Alarms”, then “Create alarm”. Select the ReplicaLag metric:

To demonstrate the email notification, I will trigger the alarm when the “Replica Lag” value is below one second. In a realistic scenario, the rule should be the opposite – you’ll be triggering the alarm when the threshold goes above some value. It’s quite hard to simulate a higher replication lag, so I just flipped the logic for demo purposes.

I have pre-created an SNS topic to receive an email when the alarm goes off.

After the alarm is created, you need to wait for a moment, and it will go into an “In Alarm” state:

You should also receive an email similar to this:

Promoting the Standby Replica

When a read replica gets promoted, it essentially becomes an independent database instance accepting both reads and writes.

Here are some situations where promoting a replica may be required:

  1. Failover: If the primary RDS instance faces an issue that makes it unusable, you may need to quickly promote the read replica to ensure business continuity.
  2. Maintenance: You may want to perform maintenance or upgrades on the primary instance and can’t afford any downtime.
  3. Data Center Migration: If you’re moving your resources to another geographical region for compliance or latency reasons, promoting a read-replica can make the process smoother.

To promote the read replica, navigate to the RDS dashboard, select the RDS instance, select Actions, and then click “Promote”.

Once the replica is promoted, we can double-check we can insert new data into it:

Summary

In this article, we examined setting up a resilient AWS architecture with RDS replication across primary and backup regions.

Using AWS CloudFormation, EC2 instances, and RDS, we demonstrated how to establish a primary and read-only replica, set up monitoring via CloudWatch alarms, and promote a read-replica.

See you next time!

Resources

  1. Udacity AWS Cloud Architect Nanodegree

Site Footer

Subscribe To My Newsletter

Email address