Navigation

Backup Preparations

Overview

Before backing up your cluster or replica set, decide how to back up the data and what data to back up. This page describes items you must consider before starting a backup.

Important

Only sharded clusters or replica sets can be backed up. To back up a standalone mongod process, you must first convert it to a single-member replica set.

For an overview of how Backup works, see Backup.

Snapshot Frequency and Retention Policy

By default, Cloud Manager takes a base snapshot of your data every 6 hours.

If desired, administrators can change the frequency of base snapshots to 6, 8, 12, or 24 hours. Cloud Manager creates snapshots automatically on a schedule. You cannot take snapshots on demand.

Cloud Manager retains snapshots for the time periods listed in the following table. If you terminate a backup, Cloud Manager immediately deletes the backup’s snapshots.

Snapshot Default Retention Policy Maximum Retention Policy
Base snapshot 2 days 5 days
Daily snapshot 7 days 1 year
Weekly snapshot 4 weeks 1 year
Monthly snapshot 13 months 3 years

Changes to the snapshot schedule affect your snapshot storage costs.

You can change a backed-up deployment’s schedule through its Edit Snapshot Schedule menu option, available through the Backup page. Administrators can change snapshot frequency and retention through the snapshotSchedule resource in the API. If you change the schedule to save fewer snapshots, Cloud Manager does not delete existing snapshots to conform to the new schedule. To delete unneeded snapshots, see Delete a Snapshot.

Namespaces Filter

The namespaces filter lets you specify which databases and collections to back up. You create either a Blacklist of those to exclude or a Whitelist of those to include. You make your selections when starting a backup and can later edit them as needed. If you change the filter in a way that adds data to your backup, a resync is required.

Use the blacklist to prevent backup of collections that contain logging data, caches, or other ephemeral data. Excluding these kinds of databases and collections will allow you to reduce backup time and costs. Using a blacklist is often preferable to using a whitelist as a whitelist requires you to intentionally opt in to every namespace you want backed up.

Storage Engine

When you enable backups for a sharded cluster or a replica set that runs on MongoDB 3.0 or later, you can choose the storage engine for the backups. MongoDB provides both the MMAPv1 and WiredTiger storage engines. If you do not specify a storage engine, Cloud Manager uses MMAPv1 by default.

The MMAPv1 engine minimizes storage in the blockstore: all blocks are already compressed and padding is disabled. WiredTiger compression offers no further storage benefit.

Insert Only MongoDB Workloads

WiredTiger may be preferred when backing up insert only MongoDB workloads that benefit from high levels of block-level de-duplication in the blockstore. You may see a reduction in storage when running the backup database on WiredTiger.

You can choose a different storage engine for a backup than you do for the original data. The backup storage engine does not need to match match that of the original data. If your original data uses MMAPv1, you can choose WiredTiger for backing up, and vice versa.

You can change the storage engine for a cluster or replica set’s backups at any time, but doing so requires an initial sync of the backup on the new engine.

Encryption

The WiredTiger encryption option is not available for storing backups. You can store backups using the WiredTiger storage engine, but you cannot enable the encryption option. If you restore from a backup, you restore unencrypted files.

WiredTiger Options

If you choose the WiredTiger engine to back up a collection that already uses WiredTiger, the initial sync replicates all the collection’s WiredTiger options. For information on these options, see the storage.wiredTiger.collectionConfig section of the Configuration File Options page in the MongoDB manual.

Index collection options are never replicated.

For more information on storage engines, see Storage in the MongoDB manual.

Resyncing Production Deployments

For production deployments, it is recommended that as a best practice you periodically (annually) resync all backed-up replica sets. When you resync, data is read from a secondary in each replica set. During resync, no new snapshots are generated.

You may also want to resync your backup after:

  • A reduction in data size, such that the size on disk of Cloud Manager’s copy of the data is also reduced. This scenario also includes if you:
  • A switch in storage engines, if you want Cloud Manager to provide snapshots in the new storage engine format.
  • A manual build of an index on a replica set in a rolling fashion (as per Build Indexes on Replica Sets in the MongoDB manual).

Checkpoints

For sharded clusters, checkpoints provide additional restore points between snapshots. With checkpoints enabled, Cloud Manager creates restoration points at configurable intervals of every 15, 30 or 60 minutes between snapshots. To enable checkpoints, see enable checkpoints.

To create a checkpoint, Cloud Manager stops the balancer and inserts a token into the oplog of each shard and config server in the cluster. These checkpoint tokens are lightweight and do not have a consequential impact on performance or disk use.

Backup does not require checkpoints, and they are disabled by default.

Restoring from a checkpoint requires Cloud Manager to apply the oplog of each shard and config server to the last snapshot captured before the checkpoint. Restoration from a checkpoint takes longer than restoration from a snapshot.

Snapshots when Agent Cannot Stop Balancer

For sharded clusters, Cloud Manager disables the balancer before taking a cluster snapshot. In certain situations, such as a long migration or no running mongos, Cloud Manager tries to disable the balancer but cannot. In such cases, Cloud Manager will continue to take cluster snapshots but will flag the snapshots with a warning that data may be incomplete and/or inconsistent. Cluster snapshots taken during an active balancing operation run the risk of data loss or orphaned data.

Snapshots when Agent Cannot Contact a mongod

For sharded clusters, if the Backup Agent cannot reach a mongod process, whether a shard or config server, then the agent cannot insert a synchronization oplog token. If this happens, Cloud Manager will not create the snapshot and will display a warning message.