Random Simple Things that have Worked in the Past and are Likely to Work Again¶
A Sunfire Goes Down¶
As of 6/27/2016.
Note: one of the sunfires can’t be booted with all the disks in. If this sunfire fails, pop all the disks except for the SSD (which is a pain to get back in), and the boot drives (drives 1 and 2). Then boot the machine. Then, after it has booted, put the disks back in, and follow the instructions below.
- ssh onto magellan, and run
bmc sunfire0-bmc chassis power cycle
. - ssh onto the sunfire, probably through magellan. Run
zfs mount -a
if the zpools aren’t already mounted, and then start ceph (either through sys v init or systemd, depending on the sunfire). - monitor
ceph health
(on any ceph monitor: crimea, magellan, or gomes) to ensure that ceph comes back up properly.
Mail server fails IMAP requests¶
As of 7/21/2016.
Run sudo journalctl -u dovecot
on crimea.acm.jhu.edu.
If it says that a connection timed out to acmsys/Maildir, then there’s a problem with the afs mail dir servers on chicago.
First things first, check the ZFS status. Run zpoll status
. If that says
something is wrong, debug the zpool.
To restart the maildir server run /etc/init.d/openafs-fileserver restart
. If
takes longer than ~10 minutes something else is wrong. Try restarting chicago.
You can’t do ceph things with cinder (like create/delete volumes)¶
As of 9/25/16.
Restart cinder-volume on gomes.
Ceph won’t start on a sunfire due to permission errors¶
As of 2/28/17.
Run chown -R ceph:ceph /var/run/ceph
, then try again.
See http://tracker.ceph.com/issues/15553 for more info.
A ceph mon is down after a restart¶
As of 3/11/2017.
Run systemctl restart ceph
. The issue is that, since our ceph config is
served out of AFS, we have an implicit dependency on AFS, but systemd doesn’t
know it (this should be fixed at some point). Anyway, by the time you ssh into
the machine to manually restart ceph, openafs-client should be up, so simply
restarting ceph should just work.
You Can’t Delete OpenStack VMs (they’re stuck in the deleting state)¶
As of 4/12/2017.
ssh to the compute node that the instance was running on, and restart the nova daemon on it.