The machine fl-nagman.ucar.edu, localted at FL2-3095/G4-E5, is a
backup to nagman, the ML nagios server. fl-nagman monitors the
exact same set of resources as nagman. In fact, fl-nagman is
almost an exact replica of nagman, synchronized nightly by means of an
rsync script that is triggered by cron on nagman.
Replication
nagman:/root/rsync-to-fl.sh contains the script used to sychronize the
content of the servers. It is an rsync over ssh push from nagman
to fl-nagman. A dedicated ssh key nagman:/root/.ssh/id_dsa is
used, and the fl-nagman:/root/.ssh/authorized_keys file is further
restricted to allow only nagman's IP to connect.
Everything is copied with root privileges, except those items marked as
--exclude in the script. Files not on the origin server are
deleted from fl-nagman.
The nagman cron job that triggers the copy at 4:05am nightly is:
4 5 * * * /root/rsync-to-fl.sh
Identity
There is no real motivation for fl-nagman to assume nagman's identity
in the event of a nagman failure, because the Nagios application has
equal functionality under the fl-nagman identity. For the end
user, it's simply a matter of switching the browser URL. If
nagman is expected to be down for an extended period of time, DNS can
be updated (by DSG) to have nagman resolve to the FL machine.
Potential Uses
fl-nagman
can be used by NETS or the NCAR NOC to continue to monitor our networks
in the event that nagman is either down or unreachable. It can
also be used as a development server to test major changes to nagios,
but keep in mind that any changes will be clobbered overnight when the
rsync script triggers.