I work at a company were we depend on real-time data with no tolerance of high-latencies. We provide the largest catalog of antiques in the US with over 700 million records. We store our the description for all these records in dynamodb. Implementing AWS DynamoDB backup using Data pipelines was a hassle and too complicated
I came across https://github.com/bchew/dynamodump after some research and it was an idle solution for what I wanted
It worked amazingly for me. It was simple to setup using Boto3 and an AWS account. You can use this tool from your local machine or from an EC2, depending on your connectivity and data-size of course. I created a 3TB EBS on an AWS instance since our dynamodb was close to a 1TB large. Also, wasn't sure if the script would add extra meta-data here and there.
You need to setup an AWS IAM User with your access key & secret. After that, you need to give the user full dynamodb permissions. There is an AWS managed policy called AmazonDynamoDBFullAccess and you can use it for the entire operation. Let's backup a table called test_table in US East Coast on AWS:
python dynamodump.py -m backup -r us-east-1 -s test_table
Our table was extremely large so I needed the script to keep running in the background even when the terminal exits. I did:
dynamodump.py -m backup -r us-east-1 -s test_table >> test_table.log &
It worked like a charm. Make sure you set the DATA_DUMP variable in the script to point to the mount-point of your EBS. If you are not interested in setting up boto3 and aws_cli. You can run the script by passing the AWS key and secret:
python dynamodump.py -m backup -r us-east-1 -s test_table --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY
Before you restore a table, make sure you create an empty table in DynamoDB using AWS console or cli. Make sure the table is empty before you start inserting data. If you don't the script may fail. In this example, we create an empty table named test_table1 in the same us east coast region:
dynamodump.py -m restore -r us-east-1 -s test_table2
The script allows you backup data or schema only or both as in above. Remember, you can setup boto3 & aws_cli or you can decide to pass your aws key and secret directly when using the script.
python dynamodump.py -m restore -r us-east-1 -s test_table2 --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --schemaOnly
python dynamodump.py -m restore -r us-east-1 -s test_table2 --accessKey AWS_ACCESS_KEY --secretKey AWS_SECRET_KEY --dataOnly
That's all for now folks. You can check out the github back of the script for more examples. https://github.com/bchew/dynamodump