Working with non-relational databases like MongoDB feels, in many ways, different from working with traditional structured ones. Instead of a series of interrelated tables, your data is stored in documents. Because of this, instead of the limitations of relational databases, which require building a structure before entering data, with NoSQL databases like MongoDB you gain the ability to construct your data on the fly. While it might not provide as much mathematical structure and internal control as a traditional relational database like MySQL or PostgreSQL, MongoDB is more scalable and provides considerably superior performance for online applications.
MongoDB Data Structure
Before we get into the backup procedure, let’s take a brief refresher on how data is structured within your MongoDB databases.
Below is a simple document entry in MongoDB. Records are recorded in JSON format, as key-value pairs. Values may also be stored as arrays as below.
JSON by itself is great for managing hierarchical relationships of data, and very flexible. However, it doesn’t, by itself, handle many data types that you typically want to store in a database, so in MongoDB data is stored in BSON, which takes JSON data and stores it as binary objects. MongoDB is then able to create indexes and allow queries against objects, which enable access to both top-level and any nested BSON keys.
However, as data in MongoDB have been converted to binary objects, for backup purposes we will need to run through the mongodump procedure. Despite the differences in how data is stored and managed, the process for backing up your data, while slightly different from traditional relational databases like MySQL or PostreSQL, is in some ways very similar.
Backup Procedure
To backup your MongoDB database, you will need to create a mongodump file. There are a few very good graphical interface tools for managing your NoSQL databases, like Robo 3T (formally known as Robomongo) or NoSQL Booster which are very helpful at managing your data visually. As most of us tend to process information this way, it’s not a bad idea to use these; perhaps you already are. You can certainly run backup procedures using tools like this.
However, for the purposes of this tutorial, we need to be working with a command-line, as it is far more powerful at creating and running backups. To be able to run cronjobs (or a batch file and task scheduler if you are on a Windows server), you need to be able to understand command-line syntax. The mongodump command will take your MongoDB database and store it in a binary format.
Here is an example of a mongodump command, through a root-user account.
However, typically you would be running your backups from a non-root account (which is a much better idea, especially since you’ll be running this through a shell script, which should not be run from root).
Let’s break this apart.
sudo tells your system to run as a super user, mongodump, is, of course, the mongodump command. There are a couple of options here. –db customers tells the mongodump to use the database customers. If you don’t use this, it will backup all MongoDB databases on your system; typically you will want to specify for granularity and ease of backup. –out /var/backups/mongobackups/`date +”%m-%d-%y”` tells mongodump to output a database into the /var/backups/mongobackups/[the date of the backup]/customers.
Note that the directory you have defined exists subdirectory with the name of the date stamp, followed by the name of “customers.” The point of this is so you can store multiple backups and you’ll know when this file was created. This will be crucial for when we set up the automation through a cronjob.
There are a few other useful options. You can choose to dump only specific entries using the –query option to limit the output of the mongodump according to your specifications. This uses the following syntax (note, the query itself needs to be enclosed in single quotes to avoid interacting with your shell):
In case your database is very large, instead of you can use –gzip to export your dump to a zipped file.
Scheduling
Now that you have a method for running a mongodump, you will want to automate this process. Of course you could do this manually, but you would need to remember to do it on a regular basis. Relying on your memory, or even a reminder, is a bad practice, as (I assume) you are human. Use your system instead.
To do this, put your entire command as structured above into a shell script and name it something like backup.sh
To run it at 1 AM every day, simple open your crontab file by running sudo crontab -e via the command-line, and enter the following text.
Note: you could theoretically run the entire mongodump directly in the crontab, however my personal preference is to store it in a shell; this is helpful if you have a number jobs you’d like to run at the same time, such as several databases. It’s a matter of personal choice.
Side note: it’s not a bad idea to periodically monitor these directories to make sure files are still there and that everything is still running.
If you are running this on a Windows server, instead of using backup.sh, name it something like backup.bat and then instead of running a crontab, which does not exist in Windows, use the task scheduler.
Cleaning out Obsolete Backups
You will want to periodically clean out the directory with your backups, especially if you have a large database. As you are creating a copy of your entire database each time this backup process runs, you will start eating up server space very quickly.
Of course, like before, it’s easy to forget, so you will want to create a shell script to clean out these old copies. Create another shell script, and name it something like cleanup.sh
In cleanup.sh, you would put something like this:
rm -rf /var/backups/mongobackups
which will delete all subdirectories and files below the level of /mongobackups.
Then, to run this weekly on Sunday at 2 AM, in your cronjob enter the following line (after the previous command)
0 2 * * 0 /bin/sh cleanup.sh
Restoring your MongoDB database
Okay, so the unpleasant has occurred. For whatever reason, your existing database is gone or has been corrupted. Thankfully, due to the above procedures, you have a backup; and typically multiple backups. Restoring your MongoDB from backup files is no more difficult than creating the files in the first place.
To restore your database from one of your backups you simply need to run the following command:
sudo mongorestore –db customers –drop /var/backups/mongobackups/[specific date]/customers/
Breaking this command down, as a superuser, it runs mongorestore on your database named “customers” using the copy of your MongoDB database which is stored in your mongobackups/[date]/customers directory.
By using the —drop option, this instructs the restore process to remove any existing copies of the “customers” database. If nothing’s there, it does no harm. You also have options such as creating a copy of the database onto another server. To do so, simply make a copy of your backup and move it to where you would like to store a copy, and you can use the same mongorestore command.
One important note:
You have backups on your server now, and you know how to recover your data. This is great; you have a way of restoring your database in the event of a disaster. However, you need to consider the possibility that the entire server could become compromised. No matter how many copies you have made of your database, if they are still all on the server, you are still vulnerable. For this reason it’s a good idea to make sure that these files are backed up somewhere else as well, either on a separate computer or in the cloud. While this is a bit beyond the scope of this particular tutorial, you want to be certain you have a procedure for backing up your server in a worst-case scenario. There are many services which do this, or you may wish to manually make copies yourself; there’s no such thing as too many options.
Conclusion
This is, of course, only a cursory explanation of backing up your MongoDB data. There are other options beyond the scope of this introductory backup tutorial, such as scaling your data across multiple servers, which provides not only better backup and replication, but also increases both the amount of data that can be stored, but also to improve speed for large datasets. However as demonstrated here, understanding how the backup process works will give you a great start to making sure that your data is safe.