Laptop in a dark room running MongoDB backups

MongoDB Backup Strategies: mongodump, Snapshots, and Point-in-Time Recovery

Table of Contents

MongoDB's flexibility is the reason teams choose it and the reason backing it up is more nuanced than it first appears. There is no single "right" MongoDB backup; there is a small menu of strategies (mongodump, filesystem snapshots, oplog-based point-in-time recovery, and Atlas's built-in backups) each with different answers to the questions that matter: how much data can you lose, how long does a restore take, and what does it cost to operate?

This guide walks through each strategy with the commands, the trade-offs, and the failure modes that bite production teams. By the end you will know exactly which combination fits your deployment, whether that is a single replica set on a droplet or a sharded cluster on Atlas.

Start with the two numbers that decide everything

Before comparing tools, write down your RPO and RTO. RPO (recovery point objective) is the maximum data loss you can tolerate, measured as time: if your last good backup is from 11 PM and the incident happens at 3 PM, your RPO exposure was 16 hours. RTO (recovery time objective) is how long you can afford to be down or degraded while restoring.

For most production applications, the honest targets are an RPO of one hour or less and an RTO measured in minutes to a couple of hours. Hold each strategy below against those numbers and the decision mostly makes itself.

Strategy 1: mongodump, the logical backup workhorse

mongodump connects to your deployment and exports collections as BSON, the same binary format MongoDB uses internally. It is the MongoDB analog of pg_dump or mysqldump: a logical backup that is portable, granular, and easy to automate.

A production-shaped invocation looks like this:

mongodump --uri="mongodb://backup_user:$PASS@db1.internal:27017/?authSource=admin&replicaSet=rs0&readPreference=secondary" \
  --gzip --archive=/backups/myapp-$(date +%Y%m%d-%H00).archive

Three details in that command carry most of the weight. readPreference=secondary sends the read load to a secondary member so the primary never feels the backup. --gzip compresses on the fly, typically shrinking BSON by 60 to 85 percent. And --archive writes a single streamable file instead of a directory tree, which pipes cleanly to off-site storage.

Consistency: the flag people forget

By default, mongodump reads collections one after another, so a write that lands mid-dump can leave your backup internally inconsistent: an order document without its corresponding inventory decrement, for example. On a replica set, fix this with --oplog:

mongodump --uri="..." --oplog --gzip --archive=/backups/myapp.archive

--oplog captures the operations log alongside the data, and mongorestore --oplogReplay replays those operations to produce a snapshot consistent to the moment the dump finished. Two caveats: it works on full-instance dumps only (not with --db or --collection filters), and on sharded clusters it does not provide cluster-wide consistency, a problem we come back to below.

Where mongodump shines and where it strains

For databases up to roughly 100 GB, mongodump on an hourly or daily schedule is simple, reliable, and restores granularly: you can pull back one collection, or even filter documents with --query on the way out. That granularity is what makes logical backups work as checkpoints you can roll back to when a bad deploy corrupts one collection, an increasingly common incident as teams ship AI-assisted code against production data.

The strain shows at scale. mongodump reads every document through normal queries, which means it competes for cache and takes hours on terabyte-class data. Restores are slower still, because mongorestore must rebuild every index after loading data. Past a few hundred gigabytes, you graduate to snapshots for full-instance recovery and keep mongodump for granular per-collection work.

Strategy 2: filesystem and volume snapshots

A snapshot captures the database's underlying files at a block level: EBS snapshots on AWS, droplet or volume snapshots on DigitalOcean, LVM snapshots on bare metal. Speed is the headline: snapshotting a terabyte takes minutes, and restoring means attaching a volume rather than replaying documents.

The requirements are strict, though. With WiredTiger and journaling enabled (the default), a snapshot of a single volume containing both data and journal is crash-consistent: MongoDB will recover it the way it recovers from a power cut. If data and journal live on separate volumes, you must stop writes first with db.fsyncLock(), snapshot, then db.fsyncUnlock(), and run this against a secondary so production never blocks.

Snapshots are all-or-nothing. You cannot restore one collection from a volume snapshot without standing up a whole instance from it and dumping the collection out, the same limitation RDS users hit with instance snapshots. This is precisely why mature setups run snapshots and logical dumps, not one or the other.

Strategy 3: oplog tailing for point-in-time recovery

Every replica set maintains an oplog: a capped collection recording every write. Continuous backup tools tail this oplog and archive the operations stream, which buys the gold standard of recovery: restore the last full backup, then replay the oplog up to 14:31:59, one second before the bad deploy ran its destructive update.

You can assemble this yourself: a periodic mongodump baseline plus a process that dumps oplog segments every few minutes (mongodump -d local -c oplog.rs with a timestamp query) to object storage. It works, and it is also a distributed system you now own: the tailer must survive restarts, elections, and oplog rollover, and your restore procedure becomes a multi-stage replay that needs rehearsal. Be honest about whether your team will maintain that. For most teams below the size of having a dedicated platform group, hourly dumps deliver 90 percent of the benefit at 5 percent of the operational cost, and the remaining gap is what managed continuous backup (Atlas's PIT restore, for instance) is for.

Strategy 4: Atlas backups, and what they do not cover

If you run MongoDB Atlas, cloud backups with point-in-time restore are a checkbox, and they are genuinely good: snapshot schedules, oplog-based PIT within your retention window, restores to new clusters. Turn them on.

Then notice what they do not cover. Atlas backups live inside Atlas: they restore to Atlas clusters, they are governed by your Atlas retention settings, and if the account is compromised or a billing mishap suspends the project, your backups share the blast radius. Restoring a single collection still means restoring to a temporary cluster first. And lower-tier clusters have meaningful limits on snapshot frequency and retention.

The standard belt-and-suspenders move is keeping Atlas backups on while also exporting regular mongodump archives to object storage you control: S3, Spaces, B2, Azure. That gives you provider independence, audit-friendly custody of your own data, and fast granular restores without spinning up scratch clusters.

Sharded clusters: the special case

Everything above assumes a replica set. Sharded clusters add a hard problem: a consistent cluster-wide backup requires coordinating across all shards plus the config servers, and naive per-shard mongodumps taken at slightly different moments produce a backup where cross-shard data disagrees with itself. Practical guidance: stop the balancer before any cluster-wide backup window (sh.stopBalancer()), back up each shard and the config server replica set, and restart the balancer after. If cluster-wide point-in-time consistency is a real requirement, that is the strongest argument for Atlas or for purpose-built tooling rather than scripts; this is the one scenario where DIY honestly struggles.

The operational layer: where MongoDB backups actually fail

Across all four strategies, the failures that cause real data loss are rarely about the dump command. They are operational, and they repeat across teams with eerie consistency.

The silent stop. An auth change, a TLS certificate update, or a version upgrade breaks the backup job, and cron does not escalate. Months later someone needs a restore and the latest archive is from February. Defense: a heartbeat ping sent only after a verified successful upload, monitored by something that alerts on silence.

The size anomaly nobody checked. The dump "succeeds" but is a tenth of yesterday's size because a connection string now points at the wrong replica or an authSource typo scoped it to one small database. Defense: compare each archive's size to the previous run and alert on deviation.

The never-tested restore. mongorestore has its own sharp edges: --drop semantics, index build time, version compatibility between the dump's source and the restore target. The middle of an incident is the wrong time to learn them. Defense: quarterly restore drills into a scratch instance, with a row-count spot check on your critical collections and a stopwatch running so your RTO is a measurement, not a guess.

The on-server archive. Backups stored on the database host share its disk, its region, and its attacker. Stream every archive off-site as part of the job that creates it; mongodump --archive pipes directly into rclone rcat so a temp file never touches local disk, the same pattern we walk through in our hourly snapshots guide.

A sane default architecture

For a typical production replica set, the configuration that balances safety, cost, and operational load looks like this. Hourly mongodump --oplog --gzip --archive against a secondary, streamed to object storage you control, with tiered retention: 24 hourlies, 7 dailies, 4 weeklies, 12 monthlies. Daily or twice-daily volume snapshots for fast full-instance recovery. Alerting on failure and on size anomalies, delivered where your team actually looks, and a quarterly restore drill on the calendar. If you are on Atlas, keep its backups on and add the hourly external dumps for independence and granularity.

That architecture gives you an RPO under an hour for the common incidents (bad deploys, fat-fingered deletes, application bugs), a fast path for the rare catastrophic ones, and backups that exist outside any single provider's blast radius.

Frequently asked questions

Is mongodump safe to run on production?

Yes, with two qualifications: point it at a secondary with readPreference=secondary so the primary is untouched, and expect cache pressure on very large databases since mongodump reads through the normal query path. On replica sets of ordinary size, an hourly mongodump is routine.

Does mongodump include indexes?

It exports index definitions, and mongorestore rebuilds the indexes after loading data. Budget restore time accordingly: on collections with many indexes, the rebuild can take longer than the data load itself.

Can I back up a single collection?

Yes: mongodump --db myapp --collection orders, optionally with --query to filter documents. Note that --oplog consistency is unavailable for partial dumps, so reserve collection-level dumps for ad hoc copies and run full-instance dumps on your schedule.

How long should I keep MongoDB backups?

Tiered retention answers this better than a single number: dense recent history (hourly for a day, daily for a week) for incident recovery, sparse long history (weekly for a month, monthly for a year) for compliance and forensics. Adjust the monthly tail to whatever your auditors or contracts require.

mongodump vs mongoexport: which should I use for backups?

mongodump, always. mongoexport writes JSON or CSV, which is convenient for handing data to analysts but lossy as a backup format: it cannot faithfully round-trip every BSON type, so dates, decimals, and binary fields can come back subtly wrong. mongodump preserves exact BSON and is the only one of the two designed for restore fidelity. Treat mongoexport as a reporting tool, never as a safety net.

What permissions does the backup user need?

Create a dedicated user with the built-in backup role (and restore on the account you use for drills), nothing more. The backup role grants the read access mongodump needs, including the oplog, without any ability to modify data. A backup pipeline holding write-capable credentials is an unnecessary risk: if the job or its host is ever compromised, read-only scope is the difference between an exposure and a catastrophe.

Wrapping up

MongoDB backup strategy is a layering exercise: mongodump for granular, portable, frequent checkpoints; snapshots for fast full-instance recovery at scale; oplog machinery (yours or Atlas's) when point-in-time recovery is a genuine requirement; and in every case, off-site storage, alerting, and rehearsed restores, because the operational layer is where backups really live or die. If you would rather not own that layer, Ottomatik's automated MongoDB backups run mongodump on your schedule, stream archives to your own storage across 15+ providers, rotate retention automatically, and alert your team in Slack the moment anything fails. The first backup takes about three minutes to set up.

Ready to secure your backups today?

Try for free
14 Day Free Trial • Cancel Anytime • No Credit Card Required