Building a compact Pi cluster
At the start of 2021, I built my first “Raspberry Pi Cluster-in-a-Box”. It was based on a HAT (Hardware Attached on Top) expansion board for the Pi, called the Cluster HAT.
Some folks thought it was cool… and others (somewhat predictably) asked… why?
The answer then, as now, is a combination of “just because”, and “to learn more about clustering”!
This weekend, I got around to updating it. I’ve taken it from the original set of Pi Zeros, to the more recent Zero 2 Ws. It took me a while to collect enough of the newer boards to do it (due to limitations on how many you can purchase from most suppliers); and then, a surplus of other projects got in the way, before I had the time to pick it up again.
So, this is now a Raspberry Pi 4B 4Gb with a Cluster HAT and 4x Pi Zero 2 W, now all running on Bullseye 64-bit Raspberry Pi OS. This is much more functional and useful than the previous iteration, due to the much more capable processor on the Zero 2. At the moment this is running Slurm across all 5 nodes to run workloads, and I have plans to install Kubernetes. The case is this one, from The Pi Hut.
Configuration
I used several resources to get this rebuilt, but since I am using the new Pis, and wanted to update to the latest OS, there were a number of changes along the way. First of all, huge shout outs to Chris Burton of Cluster HAT fame, the contributors to the Cluster HAT mailing list, Garrett Mills, and (mostly) to Davin L. who wrote the excellent “The Missing Cluster HAT tutorial” that got me up-and-running both this time, and the previous time, around.
I’m not going to repeat that whole tutorial line-by-line here. It’s the main guide I used to get through the initial build, and I switched over to Garrett’s guide to building a cluster (parts 2 and 3 in particular) in order to get Slurm and Munge fully configured. I will make a few call-outs to things that were different, or choices I made.
With that, let’s go. Remember that this summarises the tutorial with changes where I made them; it’s not a replacement.
Pre-work
- I knew I was going to build a “CNAT” configuration Cluster HAT setup again. This means that only the controller is directly accessible / visible on my home network, with the 4 nodes accessible from the controller itself.
- There are 5 OS images (1 Controller and 4 Nodes) downloadable from the Cluster HAT site, which are mostly pre-configured. I chose the Bullseye 64-bit builds from the testing directory; recent mailing list conversation shows that they are pretty solid (but I’m thinking again about this, see later)
- Install each image to microSD card. I used Raspberry Pi Imager, but resisted the urge to use the Cmd-X advanced options to change settings, since the images were otherwise customised.
- Before finishing with each card, I mounted them on my Mac, and ran
touch /Volumes/boot/ssh
(to enable ssh), then ejected. - For the controller image, I also pre-configured
/Volumes/boot/wpa_supplicant.conf
so that it would pop up on the network when it was ready.
Controller
- boot the main Pi (note that I did this and started configuring it, while I was still imaging the other node SD cards)
- SSH in, and change the default pi user password
sudo apt install avahi-utils ntpdate file apt-file plocate zip mosquitto mosquitto-clients
- Using
raspi-config
change hostname - (per tutorial)
ssh-keygen -t rsa -b 4096
cat ~/.ssh/id_rsa.pub
and copy - Enable fan, reduce GPU memory: set options in
/boot/config.txt
dtoverlay=gpio-fan,gpiopin=18,temp=65000
gpu_mem=16 - I choose to manually disable wifi powersave at boot in
/etc/rc.local
iw wlan0 set power_save off
sudo apt update && sudo apt upgrade -y
sudo apt-file update && sudo updatedb
sudo rpi-eeprom-update -a
- reboot…
clusterctrl on
(wait for nodes to start up / autosize their FS)
Each node
mkdir -p .ssh
echo [key] >> ~/.ssh/authorized_keys
sudo apt install avahi-utils ntpdate file mosquitto-clients -y
- Using
raspi-config
- change pw
- install legacy GL driver
- set country for wifi (“just in case we use wifi in future”) - I want to save power and resource, and the nodes are attached via USB gadget / RNDIS, so edit
/boot/config.txt
dtoverlay=disable-bt
dtoverlay=disable-wifi
gpu_mem=16
- Edit
/etc/rc.local
tvservice -o
(to disable HDMI at boot, to save power, but requires the legacy GL driver) sudo apt update && sudo apt upgrade -y && reboot
Back to the controller
- edit
/etc/hosts
- setup storage and NFS per tutorial. Initially I’m using a portable USB drive, which is an additional power drain / concern. I’ve seen voltage warnings once in the log, so I’m keeping an eye on it and may move NFS storage to another machine on the network, but that would mean rethinking the use of CNAT over CBRIDGE.
Munge setup
- per the tutorial, munge didn’t start initially complaining of permissions issues. Need to run
chown root / /etc /usr
to get this to where things were happy (otherwise they were owned by the pi user) - need to ensure
munge.key
ischmod 400
on the local nodes. I couldn’t get the munge/unmunge confirmation step to work until I worked this out and rebuilt, copied, and set permissions for the keyfile.
Slurm setup
- per the tutorial, except on Bullseye, config files are in
/etc/slurm
(not/etc/slurm-llnl
) - care with copy/paste from the tutorial — needed to fix smart quotes, and range notation for nodes [-] (this gave me pain)
- I set
CPUs=4
for each node in the config file, since the Zero 2 W is multicore; and also allocated 2 cores from the controller. - Note: the
srun --ntasks hostname
command errors out as ntasks wants a parameter; not sure if this is a change in behaviour in Slurm between versions? - To run upgrades across the cluster (this is neat; previously I’d done it all by hand):
sudo su -
srun --nodes=5 apt upgrade -y
So far, so good. I went on to try various things from Garrett Mills’ three part series, including building and testing with OpenMPI.
Bonus blinkenlichten!
The for-fun “bonus” is that there’s a blink(1) mk 3 attached on USB on the controller, with blink1-tiny-server
software managing it. This is set up to run as a service when the controller boots.
Another Pi on my network runs Homebridge and (more specific to this project) Node-RED, and now has a flow that subscribes to Cheerlights on MQTT, and calls blink1-tiny-server
over HTTP to update the light colour. This is an over-engineered way of lighting an LED on a server, but it was fun (and very simple) to build the flow. Longer term, I’m likely to use the indicator as a workload status, rather than just for Cheerlights, but in the meantime… I love that API!
So… what have we learned?
- The Cluster HAT is a really nice piece of kit (it worked the first time I set it up, and with the new Pis matching the form factor, it still works; happy with that)
- I learned about setting up Munge and Slurm…
- There was some debugging / trial-and-error in translating from the previous set of instructions to what works now, which is a great opportunity to learn and explore.
- Somehow, I’ve only just learned about the “Broadcast input” option in iTerm2… super cool for doing the same thing on multiple nodes at once, during the setup stage.
What’s next?
This is primarily an educational project for myself, and there are some potential things to look into already:
- sending workloads to the cluster for scheduling, over the rest of the home network. I’ve got mosquitto ready on the controller so I may do something with this.
- running something like Micro K8S or K3S (although, the memory limitations may remain an issue)
- the 64-bit performance of Bullseye on the Pi Zero 2 W may not be ideal and I may need to rebuild with 32-bit on the nodes
- replacing the storage with something that is less of a power suck