Fix This Page
Navigation

Alert Conditions

Overview

When you create an alert configuration, specify the targets and alert conditions described here.

Host Alerts

When configuring an alert that applies to hosts, you select the host type as well as the alert condition.

Host Types

For host type, you can apply the alerts to all MongoDB processes or to a specific type of process:

Host Type Applies To
Any type All the types described here.
Standalone Any mongod instance that is not part of a replica set or sharded cluster and that is not used as a config server.
Primary All replica set primaries.
Secondary All replica set secondaries.
Mongos All mongos instances.
Conf All mongod instances used as config servers.

Host Alert Conditions

Status

The following conditions apply to a change in status for a MongoDB process:

Host added

Sends an alert when Cloud Manager starts monitoring or managing a mongod or mongos process for the first time.

Host removed

Sends an alert when Cloud Manager stops monitoring or managing a mongod or mongos process for the first time.

Host added to replica set

Sends an alert when the specified type of mongod process is added to a replica set.

Host removed from replica set

Sends an alert when the specified type of mongod process is removed from a replica set.

Host is down

Sends an alert when Cloud Manager does not receive a ping from a host for more than 9 minutes. Under normal operation, the Monitoring Agent connects to each monitored host about once per minute. Cloud Manager will not alert immediately, however, but waits 9 minutes in order to minimize false positives, as would occur, for example, during a host restart.

If the host continues to be unreachable, the Monitoring Agent eventually reduces ping frequency to every 5 minutes for a mongod and every 20 minutes for a mongos. If a mongod or mongos again becomes reachable, Cloud Manager recognizes the process within 5 minutes.

If a mongos process (i.e., a MongoDB routing process) is not managed by Cloud Manager Automation and if the process remains unreachable for 30 days, Cloud Manager removes the process from the Deployment tab view. However, if you restart the mongos process, Cloud Manager will again detect it.

To resolve this alert, see Host Down.

Host has restarted

Sends an alert when Cloud Manager detects that a host has been restarted.

Host is recovering

Sends an alert when a secondary member of a replica set enters the RECOVERING state. For information on the RECOVERING state, see Replica Set Member States.

Host does not have the latest version

Sends an alert when the version of MongoDB running on a host has fallen behind the current stable release of MongoDB by two revisions or more. For example, if the current stable release is MongoDB 3.2.9, a host running MongoDB 3.2.8 would not trigger an alert but a host running MongoDB version 3.2.7 would trigger an alert.

For more information on MongoDB version numbering, see MongoDB Version Numbers in the MongoDB manual.

Host is exposed to the public Internet

Sends an alert when the host is exposed to the public Internet. When configured, Cloud Manager tries to make a socket connection to your hosts. If Cloud Manager is able to connect, Cloud Manager triggers the alert because the host is not behind a firewall and does not have authentication enabled.

Cloud Manager runs this check once each day.

This is a weak security validation and should not replace other auditing or intrusion detection system procedures.

Host's SSL certificate will expire within 30 days

Sends an alert when the SSL certificate for a MongoDB instance is 30 days from expiration. Cloud Manager resends the alert every 24 hours until resolution or acknowledgment. If you do not resolve or acknowledge the alert and the certificate expires, Cloud Manager continues to send the alert. If the certificate expires, the Monitoring Agent will no longer be able to reach the MongoDB instance.

Asserts

The following alert conditions measure the rate of asserts for a MongoDB process, as collected from the MongoDB serverStatus command’s asserts document. You can view asserts through deployment metrics.

Asserts: Regular is

Sends an alert if the rate of regular asserts meets the specified threshold.

Asserts: Warning is

Sends an alert if the rate of warnings meets the specified threshold.

Asserts: Msg is

Sends an alert if the rate of message asserts meets the specified threshold. Message asserts are internal server errors. Stack traces are logged for these.

Asserts: User is

Sends an alert if the rate of errors generated by users meets the specified threshold.

Average Execution Time

These alert conditions apply only to deployments running MongoDB version 3.4 or higher. These conditions measure the average execution time in milliseconds per read, write, or command operation. You can view average execution time through deployment metrics.

Average Execution Time: Commands is

Sends an alert if the average execution time for command operations meets the specified threshold.

Average Execution Time: Reads is

Sends an alert if the average execution time for read operations meets the specified threshold.

Average Execution Time: Writes is

Sends an alert if the average execution time for write operations meets the specified threshold.

Document Metrics

The following alert conditions measure average times for database operations. You can view these metrics on the Document Metrics chart when viewing metrics.

Document Metrics: Deleted is

Sends an alert if average rate per second of documents deleted meets the specified threshold.

Document Metrics: Inserted is

Sends an alert if average rate per second of documents inserted meets the specified threshold.

Document Metrics: Returned is

Sends an alert if average rate per second of documents returned meets the specified threshold.

Document Metrics: Update is

Sends an alert if average rate per second of documents updated meets the specified threshold.

Query Executor

The following alert conditions measure query performance as derived from information collected from the explain command. You can view these metrics on the Query Executor and Query Targeting charts when viewing metrics.

Query Executor: Scanned is

Sends an alert if the average rate per second to scan index items during queries and query-plan evaluations meets the specified threshold. This measurement is found on the host’s Query Executor chart.

Query Executor: Scanned Objects is

Sends an alert if the average rate per second to scan documents meets the specified threshold. This measurement is found on the host’s Query Executor chart.

Query Executor: Scanned Objects / Returned is

Sends an alert if the ratio of documents scanned to documents returned meets the specified threshold. This measurement is found on the host’s Query Targeting chart.

Query Executor: Scanned / Returned is

Sends an alert if the ratio of index items scanned to documents returned meets the specified threshold. This measurement is found on the host’s Query Targeting chart.

Opcounter

The following alert conditions measure the rate of database operations on a MongoDB process since the process last started, as collected from the MongoDB serverStatus command’s opcounters document. You can view opcounters through deployment metrics.

Opcounter: Cmd is

Sends an alert if the rate of commands performed meets the specified threshold.

Opcounter: Query is

Sends an alert if the rate of queries meets the specified threshold.

Opcounter: Update is

Sends an alert if the rate of updates meets the specified threshold.

Opcounter: Delete is

Sends an alert if the rate of deletes meets the specified threshold.

Opcounter: Insert is

Sends an alert if the rate of inserts meets the specified threshold.

Opcounter: Getmores is

Sends an alert if the rate of getmore (i.e. cursor batch) operations meets the specified threshold. For more information on getmore operations, see the Cursors page in the MongoDB manual.

Opcounter - Repl

The following alert conditions measure the rate of database operations on MongoDB secondaries, as collected from the MongoDB serverStatus command’s opcountersRepl document. You can view these metrics on the Opcounters - Repl chart, accessed through deployment metrics.

Opcounter: Repl Cmd is

Sends an alert if the rate of replicated commands meets the specified threshold.

Opcounter: Repl Update is

Sends an alert if the rate of replicated updates meets the specified threshold.

Opcounter: Repl Delete is

Sends an alert if the rate of replicated deletes meets the specified threshold.

Opcounter: Repl Insert is

Sends an alert if the rate of replicated inserts meets the specified threshold.

Memory

The following alert conditions measure memory for a MongoDB process, as collected from the MongoDB serverStatus command’s mem document. You can view these metrics on the Cloud Manager Memory and Non-Mapped Virtual Memory charts, accessed through deployment metrics.

Memory: Resident is

Sends an alert if the size of the resident memory meets the specified threshold. It is typical over time, on a dedicated database server, for the size of the resident memory to approach the amount of physical RAM on the box.

Memory: Virtual is

Sends an alert if the size of virtual memory for the mongod process meets the specified threshold. You can use this alert to flag excessive memory outside of memory mapping. For more information, click the memory chart’s i icon.

Memory: Mapped is

Sends an alert if the size of mapped memory, which maps the data files, meets the specified threshold. As MongoDB memory-maps all the data files, the size of mapped memory is likely to approach total database size.

Memory: Computed is

Sends an alert if the size of virtual memory that is not accounted for by memory-mapping meets the specified threshold. If this number is very high (multiple gigabytes), it indicates that excessive memory is being used outside of memory mapping. For more information on how to use this metric, view the non-mapped virtual memory chart and click the chart’s i icon.

B-tree

These alert conditions refer to the metrics found on the host’s btree chart. To view the chart, see View Metrics.

B-tree: accesses is

Sends an alert if the number of accesses to B-tree indexes meets the specified average.

B-tree: hits is

Sends an alert if the number of times a B-tree page was in memory meets the specified average.

B-tree: misses is

Sends an alert if the number of times a B-tree page was not in memory meets the specified average.

B-tree: miss ratio is

Sends an alert if the ratio of misses to hits meets the specified threshold.

Lock %

This alert condition applies only to deployments running MongoDB versions 2.2 through 2.6.

Effective Lock % is

For hosts running MongoDB versions 2.2 through 2.6, Cloud Manager sends an alert if the amount of time the host is write locked meets the specified threshold. You can view this metric by selecting effective on a host’s Lock % chart. This chart appears only for hosts running the applicable MongoDB versions.

Background

This alert condition refers to metric found on the host’s background flush avg chart. To view the chart, see View Metrics.

Background Flush Average is

Sends an alert if the average time for background flushes meets the specified threshold. For details on this metric, view the background flush avg chart and click the chart’s i icon.

Connections

The following alert condition measures connections to a MongoDB process, as collected from the MongoDB serverStatus command’s connections document. You can view this metric on the Cloud Manager Connections chart, accessed through deployment metrics.

Connections is

Sends an alert if the number of active connections to the host meets the specified average.

Queues

The following alert conditions measure operations waiting on locks, as collected from the MongoDB serverStatus command. Cloud Manager computes these values based on the type of storage engine. You can view queues metrics on the Cloud Manager Queues chart, accessed through deployment metrics.

Queues: Total is

Sends an alert if the number of operations waiting on a lock of any type meets the specified average.

Queues: Readers is

Sends an alert if the number of operations waiting on a read lock meets the specified average.

Queues: Writers is

Sends an alert if the number of operations waiting on a write lock meets the specified average.

Page Faults

These alert conditions refer to metrics found on the host’s Record Stats and Page Faults charts. To view the charts, see View Metrics.

Accesses Not In Memory: Total is

Sends an alert if the rate of disk accesses meets the specified threshold. MongoDB must access data on disk if your working set does not fit in memory. This metric is found on the host’s Record Stats chart.

Page Fault Exceptions Thrown: Total is

Sends an alert if the rate of page fault exceptions thrown meets the specified threshold. This metric is found on the host’s Record Stats chart.

Page Faults is

Sends an alert if the rate of page faults (whether or not an exception is thrown) meets the specified threshold. This metric is found on the host’s Page Faults chart.

Cursors

The following alert conditions measure the number of cursors for a MongoDB process, as collected from the MongoDB serverStatus command’s metrics.cursor document. You can view these metrics on the Cloud Manager Cursors chart, accessed through deployment metrics.

Cursors: Open is

Sends an alert if the number of cursors the server is maintaining for clients meets the specified average.

Cursors: Timed Out is

Sends an alert if the number of timed-out cursors the server is maintaining for clients meets the specified average.

Cursors: Client Cursors Size is

Sends an alert if the cumulative size of the cursors the server is maintaining for clients meets the specified average.

Network

The following alert conditions measure throughput for MongoDB process, as collected from the MongoDB serverStatus command’s network document. You can view these metrics on a host’s Network chart, accessed through deployment metrics.

Network: Bytes In is

Sends an alert if the number of bytes sent to the database server meets the specified threshold.

Network: Bytes Out is

Sends an alert if the number of bytes sent from the database server meets the specified threshold.

Network: Num Requests is

Sends an alert if the number of requests sent to the database server meets the specified average.

Replication Oplog

The following alert conditions apply to the MongoDB process’s oplog. You can view these metrics on the following charts, accessed through deployment metrics:

  • Replication Oplog Window
  • Replication Lag
  • Replication Headroom
  • Oplog GB/Hour

The following alert conditions apply to the oplog:

Replica time is

Sends an alert if the approximate amount of time available in the primary’s replication oplog meets the specified threshold.

Replication Lag is

Sends an alert if the approximate amount of time that the secondary is behind the primary meets the specified threshold.

To resolve this alert, see Replication Lag.

Replication Headroom is

Sends an alert when the difference between the sync source member’s oplog window and the replication lag time on the secondary meets the specified threshold.

Oplog Data per Hour is

Sends an alert when the amount of data per hour being written to a primary’s oplog meets the specified threshold.

DB Storage

This alert condition refers to the metric displayed on the host’s db storage chart. To view the chart, see View Metrics.

DB Storage is

Sends an alert if the amount of on-disk storage space used by extents meets the specified threshold. Extents are contiguously allocated chunks of datafile space.

DB storage size is larger than DB data size because storage size measures the entirety of each extent, including space not used by documents. For more information on extents, see the collStats command.

DB Data Size is

Sends an alert if approximate size of all documents (and their paddings) meets the specified threshold.

Journaling

These alert conditions refer to the metrics found on the host’s journal - commits in write lock chart and journal stats chart. To view the charts, see View Metrics.

Journaling Commits in Write Lock is

Sends an alert if the rate of commits that occurred while the database was in write lock meets the specified average.

Journaling MB is

Sends an alert if the average amount of data written to the recovery log meets the specified threshold.

Journaling Write Data Files MB is

Sends an alert if the average amount of data written to the data files meets the specified threshold.

WiredTiger Storage Engine

The following alert conditions apply to a MongoDB process’s WiredTiger storage engine, as collected from the MongoDB serverStatus command’s wiredTiger.cache and wiredTiger.concurrentTransactions documents.

You can view these metrics on the following charts, accessed through deployment metrics:

  • Tickets Available
  • Cache Activity
  • Cache Usage

The following are the alert conditions that apply to WiredTiger:

Tickets Available: Reads is

Sends an alert if the number of read tickets available to the WiredTiger storage engine meet the specified threshold.

Tickets Available: Writes is

Sends an alert if the number of write tickets available to the WiredTiger storage engine meet the specified threshold.

Cache: Dirty Bytes is

Sends an alert when the number of dirty bytes in the WiredTiger cache meets the specified threshold.

Cache: Used Bytes is

Sends an alert when the number of used bytes in the WiredTiger cache meets the specified threshold.

Cache: Bytes Read Into Cache is

Sends an alert when the number of bytes read into the WiredTiger cache meets the specified threshold.

Cache: Bytes Written From Cache is

Sends an alert when the number of bytes written from the WiredTiger cache meets the specified threshold.

System and Disk Alerts

If you use Automation, the following alert conditions measure usage on the servers that run your mongod and mongos processes.

System: CPU (User) % is

The normalized CPU usage of the MongoDB process, which is scaled to a range of 0-100%.

Disk space % used on Data Partition is

The percentage of disk space used on any partition that contains the MongoDB collection data.

Disk space % used on Index Partition is

The percentage of disk space used on any partition that contains the MongoDB index data.

Disk space % used on Journal Partition is

The percentage of disk space used on the partition that contains the MongoDB journal, if journaling is enabled.

Disk I/O % utilization on Data Partition is

The percentage of time during which requests are being issued to any partition that contains the MongoDB collection data. This includes requests from any process, not just MongoDB processes.

Disk I/O % utilization on Index Partition is

The percentage of time during which requests are being issued to any partition that contains the MongoDB index data. This includes requests from any process, not just MongoDB processes.

Disk I/O % utilization on Journal Partition is

The percentage of time during which requests are being issued to the partition that contains the MongoDB journal, if journaling is enabled. This includes requests from any process, not just MongoDB processes.

Replica Set Alerts

These alert conditions apply to replica sets.

Replica set elected a new primary

Sends an alert when a set elects a new primary. Each time Cloud Manager receives a ping, it inspects the output of the replica set’s rs.status() method for the status of each replica set member. From this output, Cloud Manager determines which replica set member is the primary. If the primary found in the ping data is different than the current primary known to Cloud Manager, this alert triggers.

Receiving this alert does not always mean that the set elected a new primary. This alert may also trigger when the same primary is re-elected. This can happen when Cloud Manager processes a ping in the midst of an election.

Replica set has no primary

Sends an alert when a replica set does not have a primary. Specifically, when none of the members of a replica set have a status of PRIMARY, the alert triggers. For example, this condition may arise when a set has an even number of voting members resulting in a tie.

If the Monitoring Agent collects data during an election for primary, this alert might send a false positive. To prevent such false positives, set the alert configuration’s after waiting interval (in the configuration’s Send to section).

For resolutions, see No Primary.

Has too few healthy members

Sends an alert when a replica set has fewer than the specified number of healthy members. If the replica set has the specified number of healthy members or more, Cloud Manager triggers no alert.

A replica set member is healthy if its state, as reported in the rs.status() output, is either PRIMARY or SECONDARY. Hidden secondaries and arbiters are not counted.

As an example, if you have a replica set with one member in the PRIMARY state, two members in the SECONDARY state, one hidden member in the SECONDARY, one ARBITER, and one member in the RECOVERING state, then the healthy count is 3.

Has too many unhealthy members

Sends an alert when a replica set has more than the specified number of unhealthy members. If the replica set has the specified number or fewer, Cloud Manager sends no alert.

Replica set members are unhealthy when the agent cannot connect to them, or the member is in a rollback or recovering state.

Hidden secondaries are not counted.

Sharded Cluster Alerts

The alert condition applies to a sharded cluster.

Cluster is missing an active mongos

Sends an alert if Cloud Manager cannot reach a mongos for the cluster.

Agent Alerts

These alert conditions apply to Monitoring Agents and Backup Agents.

Monitoring Agent is down

Sends an alert if no Monitoring Agent is detected for at least 7 minutes. Under normal operation, the Monitoring Agent sends a ping to Cloud Manager roughly once per minute. If Cloud Manager does not receive a ping for at least 7 minutes, this alert triggers. However, this alert will never trigger for a group that has no hosts configured.

Important

When the Monitoring Agent is down, Cloud Manager will trigger no other alerts. For example, if a host is down there is no Monitoring Agent to send data to Cloud Manager that could trigger new alerts.

Monitoring Agent does not have the latest version

Sends an alert when the Monitoring Agent is not running the latest version of the software.

Backup Agent is down

Sends an alert when the Backup Agent for a group with at least one active replica set or cluster is down for more than 1 hour.

To resolve this alert:

  1. To see which server hosts the Backup Agent, click Deployment, then the Servers tab.
  2. Check the Backup Agent log file on that server.
Backup Agent does not have the latest version

Sends an alert when the Backup Agent is not running the latest version of the software.

Backup Alerts

These alert conditions apply to Cloud Manager Backup.

Backup oplog is behind

Sends an alert if the most recent oplog data received by Cloud Manager is more than 75 minutes old.

To resolve this alert, see Backup Oplog is Behind.

Backup requires a resync

Sends an alert if the replication process for a backup falls too far behind the oplog to catch up. This occurs when the host overwrites oplog entries that backup has not yet replicated. When this happens, you must resync backup, as described in the procedure Resync a Backup.

Also, check the corresponding Backup Agent log. If you see a “Failed Common Points” test, one of the following may have happened.

  • A significant rollback event occurred on the backed-up replica set.
  • The oplog for the backed-up replica set was resized or deleted.
  • High oplog churn caused the agent to lose the tail of the oplog.
Inconsistent backup configuration has been detected

Sends an alert if Cloud Manager has detected that the configuration for a backup does not match the configuration of the MongoDB deployment it backs up.

To resolve this alert, see Inconsistent Backup Configuration.

Inconsistent cluster snapshot count is...

Sends an alert if Cloud Manager fails a consecutive number of times to successfully take a cluster snapshot. This alert is triggered when the number of attempts exceeds the number specified in the alert configuration.

The alert text should contain the reason for the problem. Common problems include the following:

  • There was no reachable mongos. To resolve this issue, ensure that there is at least one mongos showing on the Cloud Manager Deployment page.
  • The balancer could not be stopped. To resolve this issue, check the log files for the first config server to determine why the balancer will not stop.
  • Could not insert a token in one or more shards. To resolve this issue, ensure connectivity between the Backup Agent and all shards.

User Alerts

These alert conditions apply to the Cloud Manager Users.

User joined the group

Sends an alert when a new user joins the group.

User left the group

Sends an alert when a user leaves the group.

User had their role changed

Sends an alert when a user’s roles have been changed.

Group Alerts

These alert conditions apply to group membership, group security, and the group’s subscription.

Users awaiting approval to join group

Sends an alert if there are users who have asked to join the group. A user can ask to join a group when first registering for Cloud Manager.

Users do not have two-factor authentication enabled

Sends an alert if the group has users who have not set up two-factor authentication.

Billing Alert Condition

The following alert condition applies to Cloud Manager billing.

Credit card is about to expire

Sends an alert if the credit card on file is about to expire. The alert is triggered at the beginning of the month that the card expires. Cloud Manager enables this alert configuration when a credit card is added for the first time.