Navigation

Troubleshooting

This document provides advice for troubleshooting problems with Cloud Manager.

For resolutions to alert conditions, see also Alert Resolutions.

Getting Started Checklist

To begin troubleshooting, complete these tasks to check for common, easily fixed problems:

  1. Authentication Errors
  2. Check Agent Output or Log
  3. Ensure Connectivity Between Agent and Monitored Hosts
  4. Ensure Connectivity Between Agent and Cloud Manager Server
  5. Allow Agent to Discover Hosts and Collect Initial Data

Authentication Errors

If your MongoDB instances run with authentication enabled, ensure Cloud Manager has the MongoDB credentials. See Configure MongoDB Authentication and Authorization.

Check Agent Output or Log

If you continue to encounter problems, check the agent’s output for errors. See Agent Logs for more information.

Ensure Connectivity Between Agent and Monitored Hosts

Ensure the system running the agent can resolve and connect to the mongod processes. If you install multiple Monitoring Agents, ensure that each Monitoring Agent can reach every mongod process in the deployment.

To confirm, log into the system where the agent is running and issue a command in the following form:

mongo [hostname]:[port]

Replace [hostname] with the hostname and [port] with the port that the database is listening on.

Cloud Manager does not support port forwarding.

Ensure Connectivity Between Agent and Cloud Manager Server

Verify that the Monitoring Agent can connect on TCP port 443 (outbound) to the Cloud Manager server (i.e. api-agents.mongodb.com.)

Allow Agent to Discover Hosts and Collect Initial Data

Allow the agent to run for 5-10 minutes to allow host discovery and initial data collection.

Monitoring

Alerts

For resolutions to alert conditions, see also Alert Resolutions.

For information on creating and managing alerts, see Manage Alert Configurations and Manage Alerts.

Cannot Turn Off Email Notifications

There are at least two ways to turn off alert notifications:

Receive Duplicate Alerts

If the notification email list contains multiple email-groups, one or more people may receive multiple notifications of the same alert.

Receive “Host has low open file limits” or “Too many open files” error messages

These error messages appear on the Deployment page, under a host’s name. They appear if the number of available connections does not meet the Cloud Manager-defined minimum value. These errors are not generated by the mongos instance and, therefore, will not appear in mongos log files.

On a host by host basis, the Monitoring Agent compares the number of open file descriptors and connections to the maximum connections limit. The max open file descriptors ulimit parameter directly affects the number of available server connections. The agent calculates whether or not enough connections exist to meet the Cloud Manager-defined minimum value.

In ping documents, for each node and its serverStatus.connections values, if the sum of the current value plus the available value is less than the maxConns configuration value set for a monitored host, the Monitoring Agent will send a Host has low open file limits or Too many open files message to Cloud Manager.

Ping documents are data sent by Monitoring Agents to Cloud Manager. To view ping documents:

  1. Click the Deployment page.
  2. Click the host’s name.
  3. Click Last Ping.

To prevent this error, we recommend you set ulimit open files to 64000. We also recommend setting the maxConns command in the mongo shell to at least the recommended settings.

See the MongoDB ulimit reference page and the the MongoDB maxConns reference page for details.

Deployments

Deployment Hangs in In Progress

If you have added or restarted a deployment and the deployment remains in the In Progress state for several minutes, click View Agent Logs and look for any errors.

If you diagnose an error and need to correct the deployment configuration:

  1. Click Edit Configuration and then click Edit Configuration again.
  2. Reconfigure the deployment.
  3. When you complete your changes, click Review Changes and then Confirm & Deploy.

If you shut down the deployment and still cannot find a solution, remove the deployment from Cloud Manager.

Projects

Additional Information on Projects

Create a project to monitor additional segregated systems or environments for servers, agents, users, and other resources. For example, your deployment might have two or more environments separated by firewalls. In this case, you would need two or more separate Cloud Manager projects.

API and shared secret keys are unique to each project. Each project requires its own agent with the appropriate API and shared secret keys. Within each project, the agent needs to be able to connect to all hosts it monitors in the project.

For information on creating and managing projects, see Projects.

Munin

Important

As of Automation Agent 2.7.0, hardware monitoring using munin- node is deprecated.

munin-node is a third-party package. For problems related to installing munin-node, see the Munin Wiki.

Install and configure the munin-node service on the MongoDB server(s) to be monitored before starting Cloud Manager monitoring. The Cloud Manager agent’s README file provides guidelines to install munin-node.

See also

See Configure Hardware Monitoring with munin-node for details about monitoring hardware with munin-node.

Red Hat Enterprise Linux (RHEL 6, 7) can generate the following error messages.

No package munin-node is available Error

To correct this error:

  1. Follow the instructions on the Extra Packages for Enterprise Linux repository wiki page to install the epel-release rpm for your version of your enterprise Linux.

  2. After the package is installed, type this command to install munin-node and all of its dependencies:

    sudo yum install munin-node
    
  3. After the munin-node is installed, check to see if the munin-node service is running. If it is not, type these commands to start the munin- node service.

    service munin-node status
    service munin-node start
    

Non-localhost IP Addresses are Blocked

By default, munin blocks incoming connections from non-localhost IP addresses. The /var/log/munin/munin-node.log file will display a “Denying connection” error for your non-localhost IP address.

To fix this error, open the munin-node.conf configuration file and comment out these two lines:

allow ^127\.0\.0\.1$
allow ^::1$

Then add this line to the munin-node.conf configuration file with a pattern that matches your subnet:

cidr_allow 0.0.0.0/0

Restart munin-node after editing the configuration file for changes to take effect.

Verifying iostat and Other Plugins/Services Returns “# Unknown service” Error

The first step is to confirm there is a problem. Open a telnet session and connect to iostat, iostat_ios, and cpu:

telnet HOSTNAME 4949 <default/required munin port>
fetch iostat
fetch iostat_ios
fetch cpu

The iostat_ios plugin creates the iotime chart, and the cpu plugin creates the cputime chart.

If any of these telnet fetch commands returns an “# Unknown Service” error, create a link to the plugin or service in /etc/munin/plugins/ by typing these commands:

cd /etc/munin/plugins/
sudo ln -s /usr/share/munin/plugins/<service> <service>

Replace <service> with the name of the service that generates the error.

Disk names are not listed by Munin

In some cases, Munin will omit disk names with a dash between the name and a numerical prefix, for example, dm-0 or dm-1. There is a documented fix for Munin’s iostat plugin.

Authentication

Two-Factor Authentication

Missed SMS Authentication Tokens

Unfortunately SMS is not a 100% reliable delivery mechanism for messages, especially across international borders. The Google authentication option is 100% reliable. Unless you must use SMS for authentication, use the Google Authenticator application for two-factor authentication.

If you do not receive the SMS authentication tokens:

  1. Refer to the Manage Your Two-Factor Authentication Options page for more details about using two-factor authentication. This page includes any limitations which may affect SMS delivery times.
  2. Enter the SMS phone number with country code first followed by the area code and the phone number. Also try 011 first followed by the country code, then area code, and then the phone number.

If you do not receive the authentication token in a reasonable amount of time contact Support to rule out SMS message delivery delays.

Delete or Reset Two-Factor Authentication

To delete or reset two-factor authentication, go to https://cloud.mongodb.com/user/resetTwoFactorAuthentication. The reset button deletes your existing two-factor authentication settings and provides the option to create new ones.

Automation Checklist

Cloud Manager Automation allows you to deploy, configure, and manage MongoDB deployments with the Cloud Manager UI. Cloud Manager Automation relies on an Automation Agent, which must be installed on every server in the deployment. The Automation Agents periodically poll the Cloud Manager service to determine the current goal, and continually report their status to Cloud Manager.

To use Automation, you must install the Automation Agent on each server that you want Cloud Manager to manage.

Automation Runs Only on 64-bit Architectures

Cloud Manager provides only 64-bit downloads of the Automation Agent.

Using Own Hardware

  • If you deploy Automation manually, ensure that you have one Automation Agent on every server.

  • If you deploy the agent manually, you must create MongoDB’s dbpath and the directory for the MongoDB binaries and ensure that the user running the agent owns these directories.

    If you install using the rpm package, the agent runs as the mongod user; if using the deb package, the agent runs as the mongodb user. If you install using the tar.gz archive file, you can run the agent as any user.

Networking

All hosts must be able to allow communication between MongoDB ports. The default is 27017, but you can configure alternate port ranges in the Cloud Manager interface.

The Automation Agent must be able to connect to cloud.mongodb.com on port 443 (i.e. https). For more information on access to ports and IP addresses, see Security Overview.

Automation Configuration

After completing the automation configuration, always ensure that the deployment plan satisfies the needs of your deployment. Always double check hostnames and ports before confirming the deployment.

Sizing

  • Ensure that you provision machines with enough space to run MongoDB and support the requirements of your data set.
  • Ensure that you provision sufficient machines to run your deployment. Each mongod should run on its own host.

Frequent Connection Timeouts

The Automation Agent may frequently time out of connections for one or more of the following reasons:

  • High network latency
  • High server load
  • Large SSL keys
  • Lack of SSL accelerator
  • Insufficient CPU speed

By default, connections time out after 40 seconds. MongoDB recommends gradually increasing the value of the dialTimeoutSeconds Automation Agent configuration setting to prevent frequent premature connection timeouts. However, increasing this value also increases the time required to deploy future configuration changes. Experiment with small, incremental increases until you determine the optimum value for your deployment. See dialTimeoutSeconds in Connection Settings at Automation Agent Configuration for more information.