Bigbrother

Bigbrother is always watching! It uses nagios and ganglia to watch over all the other servers.

The ganglia information is accessible to the world, but nagios is restricted. Ask someone for the credentials.

Unpackaged Plugins

We run the following nagios plugins from Nagios Exchange that are not actually packaged:

We also need the python-ldap package for our LDAP base/sync checking.

Custom plugins

For checking SSHFP records, we use stump’s check_sshfp plugin.

For checking ZFS zpool status, we use some custom magic written by nwf that writes logs to AFS that nagios then reads, using /usr/local/nagios/libexecdir/check_execgrep.pl. Please note though that for this plugin to work (it might be a custom one, since libexecdir doesn’t sound like a Debian path, but I (TC01) was too tired to check) you need to add “# nagios: -epn” to, and I quote, “within the first ten lines of the file” to disable the embedded Perl interpreter.

For whatever reason the embedded Perl interpreter doesn’t work with this plugin.

Web interface details

We serve nagios from bigbrother.acm.jhu.edu/ instead of the default /nagios3, so there are a few configuration tweaks:

  • Make sure that /etc/apache2/conf-available/nagios3.conf is symlinked into conf-enabled/

  • In that file, change the lines

    Alias /nagios3/stylesheets /etc/nagios3/stylesheets
    ...
    Alias /nagios3 /usr/share/nagios3/htdocs
    

    to

    Alias /stylesheets /etc/nagios3/stylesheets
    ...
    Alias / /usr/share/nagios3/htdocs/
    

    Note the slash at the end of the second Alias line!

  • Go into /etc/nagios3/cgi.cfg and change

    url_html_path=/nagios3
    

    to

    url_html_path=/
    
  • Finally, the ganglia configuration in apache must be before the nagios configuration. So create a symlink from /etc/apache2/conf.d/ganglia.conf to /etc/apache2/conf-enabled/000-ganglia.conf

Reload the apache configuration or restart the apache server when this is done.

SSH Host Key Checking

Run this:

sudo su -s /bin/sh nagios
mkdir ~/.ssh || true
cat > ~/.ssh/config <<HERE
VerifyHostKeyDNS yes
StrictHostKeyChecking yes
HERE

LDAP Configuration

In order to make LDAP checks do the right thing, the following is needed in /etc/ldap/ldap.conf:

TLS_REQCERT     allow

NTP Configuration (on clients)

To make bigbrother able to speak to the NTP daemons on all client machines, you will probably want to explicitly put this in /etc/ntp.conf. Note that this may not be necessary, but NTP is finicky, and frequently bigbrother has trouble checking NTP status of hosts inside the cluster. This is the sledgehammer solution.

If inside the cluster, something like this will do the trick:

# Allow bigbrother to speak to all hosts NTP for checking statuses.
restrict 192.168.0.6 mask 255.255.255.255

If outside the cluster, probably this is what you want:

# Allow bigbrother to speak to all hosts NTP for checking statuses.
restrict 128.220.70.63 mask 255.255.255.255

Ceph Graphics

Logging PG States

Bigbrother knows how to generate graphs of counts of Ceph Storage System PG states. It’s a little primitive, but:

  • It’s available at http://bigbrother.acm.jhu.edu/ceph-rrdtool
  • /etc/sv/ceph-rrdtool/run, overseen by runit, runs ceph health -w and teases apart the PG count.
  • /etc/sv/ceph-rrdtool/ceph-rrdtool-graph.pl is run by www-data’s crontab.