DynamoDB is a NoSQL key-value database fully managed in the cloud by AWS. Although fast and scalable, its proprietary nature binds your business to AWS. When offering your software to other businesses and integrating with them, this might limit your company’s opportunity to reach more clients. For example, if your potential client’s systems are already operating on Google Cloud Platform or Microsoft Azure you might be required to setup your software on their cloud instead.
When looking for alternatives to DynamoDB, many might consider MongoDB Atlas. MongoDB is also a NoSQL database, but the fully managed Atlas service is multi-cloud by design so it can be as easily deployed on AWS as on GCP or Azure. Another reason MongoDB might be chosen are the limitations of DynamoDB indexes and transactions, shown in the table below, as well as the fact that fields added after the table creation cannot be index in strongly consistent manner on DynamoDB. This second limitation exists because in DynamoDB the Global Secondary Index needs to be created during table creation and cannot be later changed, which may hinder development of your constantly evolving software.
For those looking to gain from the flexibility of MongoDB there is fortunately a clear migration path from DynamoDB. For live production systems any downtime or data loss can translate into a loss of income or reputation, so the migration strategy must focus on achieving:
The following sections will explore an approach that can achieve this, first examining how data can be mapped and then the process of migration itself.
JSON Format Comparison
Both MongoDB and DynamoDB use JSON to export/import their data, however their data types and schema are not the same. Let us compare DynamoDB JSON to MongoDB Extended JSON v2:
We can see that DynamoDB JSON has types like “N”, “S” or “SS” between field name and value. MongoDB JSON in Canonical Format also has such types, like “$date”, “$numberDouble”, “$numberInt”, they are however different. One way to deal with that would be to write a script to convert the database dump to the MongoDB’s expected format but let us first explore a separate way that does not require writing of a custom code.
An alternative format that we can choose while exporting a DynamoDB table is Amazon Ion. It is still incompatible with MongoDB JSON but can be easily down-converted to type-less JSON as seen below. This down-converted type-less JSON is then compatible with the MongoDB simplified Relaxed Format JSON schema seen in the table above.
Data types compatibility
Despite the fact that type information will be lost while executing the conversion mentioned above, the mongoimport tool can automatically detect basic types like string, int, double, array or document. Since DynamoDB does not natively support date-time data type, dates are stored in ISO-8601 UTC formatted strings (e.g. “2022-08-31T12:31:18Z”) due to its lexicographic ordering property. If present, Blobs and Clobs need to be under 16MB limit and will be converted to Base64-encoded string and ASCII-encoded string, respectively.
The migration will take place in three key phases to achieve the full migration goals
In this phase we take a DynamoDB table dump in Ion format and store it on S3. This can be executed from the AWS Management Console. Then we can log on the ECS, access the S3 dump file and execute the simple conversion script available from AWS to down convert Ion to JSON. Finally, we can execute the mongoimport tool with “Relaxed Format” mode to import data to MongoDB.
While we were performing the full load phase, new data will have been produced on the live original application. To bring our new MongoDB database up to date we start by enabling DynamoDB Streams, which is a Change Data Capture (CDC) for item-level modifications on DynamoDB. Since events in that stream are ordered and deduplicated, it is great for our use case. We can connect this stream to AWS Lambda and write a custom code that will parse the event and upsert (or delete) the documents modified.
Since we do not want to miss any events, the DynamoDB streams should be enabled before the table export from the previous phase, and lambda enabled after the end of import. That will result in some unnecessary events processed, but since upserts are idempotent applying them should have no impact on the data, and we can just ignore empty deletes. Since we will have some backlog of messages to process, we can monitor the ReturnedItemCount metric on CloudWatch to know when we are fully synchronised. We also need to mark entities migrated in this way as ”forward-synced” in MongoDB – we will need that in the next phase.
In the last phase of the migration we want to send any new data created on the MongoDB database back to the old DynamoDB one. This allows us to keep the old system active while proving out the new one, and enables us to start using the new MongoDB database before shutting down DynamoDB. This ‘backward sync’ can be easily achieved via AWS Database Migration Service, an automated database migration tool that supports MongoDB to DynamoDB migration out-of-the-box. We need to choose the “CDC only” migration type to replicate data changes only and not the whole DB.
To avoid backward-syncing entities that has been forward-synced, thus creating a loop, we can create a DMS filter to skip entities that has been marked as “forward-synced” in the previous phase.
The MongoDB _id entity identifier will be automatically for us generated during the import. DynamoDB’s items are usually identified by Composite Primary Key composed of Hash Key (HK) and Sort Key (SK). There might also be Local Secondary Indexes (LSI) and Global Secondary Indexes (GSI), each having their own HK-SK pair.
For all above HK-SK pairs we need to create a MongoDB index to allow for as efficient querying of entities as before. This will also introduce strong index consistency that GSIs were lacking. Additionally, if the data that our application returns need to be sorted using a different field than the range filtering is done on we can take advantage of MongoDB’s Equality-Sort-Range rule by mapping Hash Key to the Equality part, Sort Key to the Range part, and sorting field to the Sort part for improved performance.
The main limitation of this migration approach is due to DynamoDB Streams having a data retention limit of 24 hours. Events older than that might be removed at any moment, which means that we have a maximum of 24 hours to complete the Full Sync Phase before we start the Forward Sync and start consuming changes. To mitigate this, migration can be executed on table-by-table basis, migrating one table at a time before moving to the next one. We can also delay creation of indexes to after migration is done to minimise write time during the procedure.
The final downtime during a switch to the new application will depend on the replication latency of the forward-sync phase. With low, sub-second values we could go for a no downtime migration, but higher ones might require some downtime to allow data to fully propagate to the new instance. A final decision on the approach will depend on what those delays mean for your business.
In this article we have shown that live migration from DynamoDB to MongoDB is possible using available import/export tools and converters. Minimal or no downtime migration can be achieved using DynamoDB Streams CDC and AWS Lambda, with the Lambda function being the only custom code to be written. We have provided the possibility to fall back to the old application without losing data created on the new one thanks to AWS Database Migration Service. We have also shown how DynamoDB limitations regarding consistent indexes and transactions can be overcome by MongoDB.
Have you migrated from DynamoDB to MongoDB, or are you interested in doing so? Please get in touch at email@example.com
Talk to us, email us or even listen to us by subscribing to our Digital Matters podcast for interviews and discussions with digital thought leaders, pioneers and industry experts.
Available via Apple podcasts, Google Podcasts and Spotify.