Setting up the Sunfires, a.k.a. thumpers¶
Using Serial Console on a Sunfire¶
We have set up the ILOMs to respond on both serial and over ssh on the management network. They’re named sunfireN-bmc (for similarity with the others, rather inaccurately). Once logged in, run “start /SP/console” to connect to the serial console; to get out, feed a newline and then ESCape and open paren. (It’s the strangest escape sequence I’ve ever seen, but that’s what Sun chose.)
Drive Enumeration Order¶
The labels on a sunfire are not in agreement with Linux’s enumeration order, though the pointers in the bottom left corner (“ATTENTION!”) for boot disks are accurate. Linux enumerates the drives thusly:
BACK OF MACHINE
-----------------------------------
ab af t x ar au aj an l d d h
aa ae s u ag av ai am k o c g
z ad r v ap at ah al j n b f
y ac q u ao as ag ak i m a e
-----------------------------------
FRONT OF MACHINE
Please note the curious reversal of au and av. I am not sure why.
Booting from USB Mass Storage Devices¶
The sunfire BIOSes are bad at what they do. They can in fact boot from USB media, but they do not have a separate option in the boot selection screen available by F8. Instead, you must enter Setup (via F2) and use the Boot menu’s Hard Disks option to promote the USB device over the actual hard disks, and ensure that the boot order is set to use hard disks first. It’s ugly, but there it is.
Installing a Sunfire, the Debian Way¶
These notes pertain to Jessie; they are probably relatively time-invariant.
Do a Debian install
using
eth0
as the primary interfaceUse
sdy
andsdac
as the root devices Currently, we use a md mirror and LVMJust before rebooting, grab a shell and:
chroot /target grub-install /dev/sdy grub-install /dev/sdac
Adjust networking
In /etc/network/interfaces,
allow-hotplug eth0 iface eth0 inet dhcp pre-up ifconfig eth0 mtu 9000
Then run:
ifdown eth0 && ifup eth0
Now add some packages and make some changes:
apt-get install sudo deborphan vim strace tcpdump adduser localadmin sudo usermod -L root
Install Ganglia reporting tool:
apt-get install ganglia-monitor
Modify /etc/ganglia/gmond.conf:
cluster { name = "Trinidad" owner = "JHU ACM" latlong = "unspecified" url = "unspecified" } udp_send_channel { port = 8649 host = bigbrother }
Install OpenAFS and Kerberos tools:
apt-get install openafs-client openafs-krb5 krb5-user
- Our kerberos realm is “ACM.JHU.EDU”, note the lack of “trinidad”.
- Our AFS cell is “acm.jhu.edu”, note the lack of “trinidad”.
While that’s going, you may as well make the machine serial-friendly:
Replace the contents of /etc/default/grub with:
# Don't forget to run update-grub GRUB_DEFAULT=0 GRUB_HIDDEN_TIMEOUT_QUIET=true GRUB_TIMEOUT=2 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` GRUB_CMDLINE_LINUX_DEFAULT="" GRUB_CMDLINE_LINUX="console=tty0 console=ttyS0,9600n8 rootwait" GRUB_TERMINAL="console serial" GRUB_SERIAL_COMMAND="serial --speed=9600 --unit=0 --word=8 --parity=no --stop=1"
And then run update-grub as the comment says. :)
Fetch the kerberos keytab for this machine into /etc/krb5.keytab:
chmod 400 /etc/krb5.keytab
Install Ceph.
The Ceph debian maintainers seem to have given up on having their packages in the official repository; experimental is (as of Nov 2016) two major versions behind. Therefore, you’re going to need to add
deb https://download.ceph.com/debian-jewel/ jessie main
to/etc/apt/sources.list
. You may also need to installapt-transport-https
because, well, debian. In any case, once you’ve done that, you should just be able toapt-get update && apt-get install ceph
. Then copy the keyring and ceph.conf from an authoritative source.Install ZFS (note: as of this writing, sunfires 1 and 2 are installed by route a, and sunfires 0 and 3 have been reinstalled via route b)
To get the packages from ZoL’s archive, run the following commands. Note that ZoL no longer supports this archive.
wget http://archive.zfsonlinux.org/debian/pool/main/z/zfsonlinux/zfsonlinux_2%7Ewheezy_all.deb dpkg -i ./zfsonlinux_2~wheezy_all.deb apt-get update apt-get install spl-dkms # (It's OK not to do this first except that it wastes time below) apt-get install zfsutils zfs-dkms zfs-initramfs sed -ie "s/ZFS_MOUNT='no'/ZFS_MOUNT='yes'/" /etc/default/zfs sed -ie '/\$remote_fs/ s/$/ +zfs-mount/' /etc/insserv.conf
The necessary packages are also available from jessie-backports. In order to effect a changeover, you’ll need to remove the current zfs packages, then enable jessie-backports[#]_ in
/etc/apt/sources.list
(you will needmain
andcontrib
), and finally:apt-get update apt-get install zfs-dkms zfs-initramfs # (Have patience...)
After, you’ll need to adjust the config as stated.
[1] Do note that there is presently (Nov 13 2016) a bug in these packages that prevents dkms from building them properly. While the bug persists, you will need to create some symlinks:
ln -s /var/lib/dkms/spl/0.6.5.8/build/spl_config.h /var/lib/dkms/spl/0.6.5.8/3.16.0-4-amd64/x86_64 ln -s /var/lib/dkms/spl/0.6.5.8/build/module/Module.symvers /var/lib/dkms/spl/0.6.5.8/3.16.0-4-amd64/x86_64/module
and then
dkms install -m zfs -v 0.6.5.8
(and have more patience).
Create some zpools:
zpool create -o ashift=12 \ stor raidz /dev/sd{a,b,c,d,e,f,g,h,i,j,k} \ raidz /dev/sd{l,m,n,o,p,q,r,s,t,u,v} \ raidz /dev/sd{w,x,z,aa,ab,ad,ae,af,ag,ah,ai} \ raidz /dev/sda{j,k,l,m,n,o,p,q,r,s,t} \ spare /dev/sdav cache /dev/sdau zpool export stor && zpool import -d /dev/disk/by-id stor zfs set checksum=fletcher4 stor zfs set compression=lz4 stor zfs set atime=off stor zfs set xattr=sa stor zfs create stor/osd zfs set recordsize=1M stor/osd
Land a crontab to keep scrubs going on a regular basis. Try to balance them, temporally, across the different sunfires so that we don’t experience the whole cluster scrubbing at once:
0 0 15 * * /sbin/zpool scrub stor
Prevent updatedb (the worker for
locate
) from traversing the ceph backing stores: In /etc/updatedb.conf, add/var/lib/ceph
to the excluded path list (PRUNEPATHS
).Create some OSDs:
OSDIX=`ceph osd create`; echo ${OSDIX} zfs set mountpoint=/var/lib/ceph/osd/ceph-$OSDIX stor/osd ceph-osd -i ${OSDIX} --mkfs --mkkey ceph auth add osd.${OSDIX} osd 'allow *' mon 'allow rwx' -i /var/lib/ceph/osd/ceph-${OSDIX}/keyring ceph osd crush add osd.${OSDIX} 10 host=`hostname`
Update /afs/acm.jhu.edu/group/admins.pub/ceph.conf and release the volume, then run
/etc/init.d/ceph start