Sync Your Data Seamlessly Across AWS S3 Buckets

Objective:

Learn how to efficiently synchronize S3 buckets across different AWS regions using the AWS CLI command aws s3 sync.

TL;DR:

This guide covers the synchronization of AWS S3 buckets between regions to optimize data management and availability. Learn to set up and execute the aws s3 sync command for efficient data transfer.

Intro:

In scenarios where you need to copy data from a regional S3 bucket to a central one in US-EAST-1 for staging purposes, the AWS CLI offers a robust solution. This is particularly useful for data engineers managing data across multiple geographic locations, ensuring that all regional data is centrally accessible for processing and analytics.

Step-by-Step Guide:

  1. Prepare the AWS CLI Environment:
    • Ensure that the AWS CLI is installed and configured with the appropriate credentials and default region settings. - AWS docs
  2. Create Destination Structure in S3:
    • Use the AWS Management Console or CLI to create a folder and subfolder structure in the destination bucket s3://<US-EAST-1 bucket name>/ For example:
aws s3 mb s3://US_EAST_1_bucket name/
  1. Set Up the Synchronization Command:
    • Format the aws s3 sync command to specify the source and target buckets, replacing <Source bucket> and <Target bucket> with your actual bucket names. Here’s how you set up the command:
aws s3 sync s3://source-bucket-name s3://US_EAST_1_bucket name
  1. Execute the Sync Command:
    1. Run the command in your terminal. This will start the process of syncing files from the source bucket to the designated folder in the destination bucket. Monitor the output to ensure files are transferred successfully.
  2. Verify the Data Transfer:
    1. Check the target bucket for the presence of the copied data. You can use the AWS S3 LS command to list the contents:
aws s3 ls s3://US_EAST_1_bucket name

Full code -

# Step 1: Navigate to your AWS CLI setupcd path/to/your/aws/cli# Step 2: Execute the sync commandaws s3 sync s3://source-bucket-name s3://US_EAST_1_bucket name# Step 3: Verify the syncaws s3 ls s3://US_EAST_1_bucket name

This tutorial demonstrates a practical application of the aws s3 sync command to ensure your data is replicated across regions effectively.