NETS header NETS Homepage UCAR Homepage NCAR Homepage SCD Homepage NETS Homepage About NETS Work requests & support
  Browse NETS topics: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

FL-nagman (NETS notes)

by John Hernandez
July 2007

The machine fl-nagman.ucar.edu, localted at FL2-3095/G4-E5, is a backup to nagman, the ML nagios server.  fl-nagman monitors the exact same set of resources as nagman.  In fact, fl-nagman is almost an exact replica of nagman, synchronized nightly by means of an rsync script that is triggered by cron on nagman.

Replication

nagman:/root/rsync-to-fl.sh contains the script used to sychronize the content of the servers.  It is an rsync over ssh push from nagman to fl-nagman.  A dedicated ssh key nagman:/root/.ssh/id_dsa is used, and the fl-nagman:/root/.ssh/authorized_keys file is further restricted to allow only nagman's IP to connect.

Everything is copied with root privileges, except those items marked as --exclude in the script.  Files not on the origin server are deleted from fl-nagman.

The nagman cron job that triggers the copy at 4:05am nightly is:

4 5 * * * /root/rsync-to-fl.sh

Identity

There is no real motivation for fl-nagman to assume nagman's identity in the event of a nagman failure, because the Nagios application has equal functionality under the fl-nagman identity.  For the end user, it's simply a matter of switching the browser URL.  If nagman is expected to be down for an extended period of time, DNS can be updated (by DSG) to have nagman resolve to the FL machine.

Potential Uses

fl-nagman can be used by NETS or the NCAR NOC to continue to monitor our networks in the event that nagman is either down or unreachable.  It can also be used as a development server to test major changes to nagios, but keep in mind that any changes will be clobbered overnight when the rsync script triggers.

Limitations and wishlist

fl-nagman has the same parenting information as ML, which causes subtle inaccuracies when an outage occurs.  This is because fl-nagman shares nagman's view that ML is the center of the network universe.  I hope to figure out an automated way to change the parenting for certain key devices each time the config gets copied to fl-nagman, time permitting.
Address comments or questions about this Web page to the Network Engineering & Telecommunications Section (NETS) at nets-www@ncar.ucar.edu. The NETS is part of the Computational & Information Systems Laboratory (CISL) of the National Center for Atmospheric Research (NCAR), which is sponsored by the National Science Foundation (NSF) and managed by the University Corporation for Atmospheric Research (UCAR). This website follows the UCAR General Privacy Policy and the NCAR/UCAR/UCP Terms of Use.