Bigbrother¶
Bigbrother is always watching! It uses nagios and ganglia to watch over all the other servers.
The ganglia information is accessible to the world, but nagios is restricted. Ask someone for the credentials.
Unpackaged Plugins¶
We run the following nagios plugins from Nagios Exchange that are not actually packaged:
- https://exchange.nagios.org/directory/Plugins/Network-Protocols/HTTP/check_ssl_cert/details for checking SSL certs.
- https://exchange.nagios.org/directory/Plugins/Games/check_minecraft/details for checking the status of our minecraft server.
- https://github.com/cirrax/openstack-nagios-plugins for checking the status of Openstack services. These plugins don’t have 100% visiblity into all of Openstack but do a pretty good job checking both the compute nodes and the controllers.
We also need the python-ldap package for our LDAP base/sync checking.
Custom plugins¶
For checking SSHFP records, we use stump’s check_sshfp plugin.
For checking ZFS zpool status, we use some custom magic written by nwf that writes logs to
AFS that nagios then reads, using /usr/local/nagios/libexecdir/check_execgrep.pl
.
Please note though that for this plugin to work (it might be a custom one, since libexecdir
doesn’t sound like a Debian path, but I (TC01) was too tired to check) you need to add
“# nagios: -epn” to, and I quote, “within the first ten lines of the file” to disable the
embedded Perl interpreter.
For whatever reason the embedded Perl interpreter doesn’t work with this plugin.
Web interface details¶
We serve nagios from bigbrother.acm.jhu.edu/ instead of the default /nagios3, so there are a few configuration tweaks:
Make sure that
/etc/apache2/conf-available/nagios3.conf
is symlinked intoconf-enabled/
In that file, change the lines
Alias /nagios3/stylesheets /etc/nagios3/stylesheets ... Alias /nagios3 /usr/share/nagios3/htdocs
to
Alias /stylesheets /etc/nagios3/stylesheets ... Alias / /usr/share/nagios3/htdocs/
Note the slash at the end of the second
Alias
line!Go into
/etc/nagios3/cgi.cfg
and changeurl_html_path=/nagios3
to
url_html_path=/
Finally, the ganglia configuration in apache must be before the nagios configuration. So create a symlink from
/etc/apache2/conf.d/ganglia.conf
to/etc/apache2/conf-enabled/000-ganglia.conf
Reload the apache configuration or restart the apache server when this is done.
SSH Host Key Checking¶
Run this:
sudo su -s /bin/sh nagios
mkdir ~/.ssh || true
cat > ~/.ssh/config <<HERE
VerifyHostKeyDNS yes
StrictHostKeyChecking yes
HERE
LDAP Configuration¶
In order to make LDAP checks do the right thing, the following is needed in
/etc/ldap/ldap.conf
:
TLS_REQCERT allow
NTP Configuration (on clients)¶
To make bigbrother able to speak to the NTP daemons on all client machines, you
will probably want to explicitly put this in /etc/ntp.conf
. Note that this
may not be necessary, but NTP is finicky, and frequently bigbrother has trouble
checking NTP status of hosts inside the cluster. This is the sledgehammer
solution.
If inside the cluster, something like this will do the trick:
# Allow bigbrother to speak to all hosts NTP for checking statuses.
restrict 192.168.0.6 mask 255.255.255.255
If outside the cluster, probably this is what you want:
# Allow bigbrother to speak to all hosts NTP for checking statuses.
restrict 128.220.70.63 mask 255.255.255.255
Ceph Graphics¶
Logging PG States¶
Bigbrother knows how to generate graphs of counts of Ceph Storage System PG states. It’s a little primitive, but:
- It’s available at http://bigbrother.acm.jhu.edu/ceph-rrdtool
/etc/sv/ceph-rrdtool/run
, overseen by runit, runsceph health -w
and teases apart the PG count./etc/sv/ceph-rrdtool/ceph-rrdtool-graph.pl
is run bywww-data
’s crontab.