Pi Network Docker

In this post, I walk through the steps I took in order to network boot a Raspberry Pi with support for running Docker. Most guides on network booting a Pi use NFS for storage and therefore don’t support running Docker because the default storage driver is overlay2, which uses overlayfs, which as of this writing does not support NFS when using multiple lower layers.

Therefore, instead of using NFS we will use iSCSI with ZFS as the backing store. While there are guides out network booting a Pi using iSCSI, there are certainly fewer of them and it seems to be the path less traveled. Nothing I present in this guide is particularly groundbreaking – it’s mostly a combination of the work of many other guides (which I’ve tried my best to link to in each section). However, some bugs and quirks that were avoided in older guides have since been fixed and are no longer necessary, so I thought an updated guide with a ZFS twist could be helpful!

The Pi Hole Docker install is well documented and quite nicely done. After installing Docker on the Raspberry Pi it was an easy git clone, a couple of modifications to select my timezone and preferred DNS servers (CloudFlare 1.1.1.1), then docker-compose up, and we were in business. Once Pi Hole is running, it is time to switch over my network. Create your own secure Home Network using Pi-hole and Docker. Peer Review Contributions by: Collins Ayuya. Raspberry Pi: A Developer’s Companion - Complete Guide with Docker. Raspberry Pi is a very compact and cheap computer (about $35). In this article, I am going to talk about how a developer uses Raspberry and why it is a good companion for you. The docker host on each PI would only have access to the memory of that pi. What Docker would allow you to do is to deploy multiple instances of your application across multiple PI’s. Then, using some load balancer you could then distribute the work across all of them, giving you the performance gain.

I’m going through this exercise because I’m one of the many people that has gotten burned by having their Pi’s SD card die and not having proper backups. Well no more! Of course, a simpler solution could be to boot via a USB hard drive, but where’s the fun in that? In all seriousness, booting from the network does give us some cool advantages: we can use the much larger storage capacity of a server, we get all the cool features and reliability of ZFS, and we can easily reimage the Pi remotely!

Overview

For this guide, I’ll be using a Raspberry Pi 3 Model B+. From what I’ve read, this should work for the Raspberry Pi 3 Model B and the Raspberry Pi 4, although I don’t have those devices so I haven’t tested it first-hand.

To support the network booting, I have my server setup to be running Proxmox as a hypervisor that is hosting an Ubuntu Server VM to run Docker containers. I have my Proxmox server setup to use ZFS (and eventually will be setting up MergerFS+SnapRAID). I will be using the Proxmox host itself to present NFS shares and iSCSI targets, but will use Docker containers to do the rest (TFTP).

When the Pi network boots, it will discover the IP address via DHCP of a TFTP server that will provide the contents necessary to bootstrap the Pi (essentially, this is bootcode.bin and the rest of the /boot partition). The TFTP server will be a Docker container that will get its content via an NFS share exposed by the host. Later, the Pi will mount this NFS share as its /boot partition such that future kernel updates will be reflected over the network.

We will configure the contents of the TFTP share to instruct the Pi to boot using a special initramfs image that we’ll build that has iSCSI support and will instruct the Pi to mount its root partition via iSCSI. This iSCSI target will be exposed on the Proxmox host and be backed by a ZFS block device.

See? It’s almost too easy!

Security (or lack thereof)

It’s important to acknowledge that this setup is extremely insecure. I’m hardly a security expert, but despite that, I’m still able to poke enough holes in this to make it look like Swiss cheese:

  • The DHCP broadcast to discover the TFTP server could be intercepted and pointed at a malicious TFTP share.
  • The entire contents of the Pi’s /boot directory are accessible via NFS and read/write-able, and we’ve only locked that down by IP address. Using this, the iSCSI username and password can be discovered, granting read/write access to the root partition.
  • I’m not sure if NFS or iSCSI are even encrypted over the network.

I’ve tried to mitigate this (somewhat) in this guide by attempting to lock things down as much as possible (using read-only when possible, using auth, etc.). Understand, however, that this is still the equivalent of putting the keys to your house under the front doormat – it wouldn’t take much to compromise this setup.

This being said, I weigh these risks against my threat model. To exploit any of these concerns, an attacker would already have to be in my network and able to intercept and manipulate traffic. Furthermore, I don’t intend to do anything mission-critical or sensitive on my Pi – I’m just going to be using it to run the OpenZWave Docker container for integration with Home Assistant. I’m hardly concerned about having a potential attacker be able to control my lights!

There are probably ways to lock this down that I may explore in the future. Secrets could be stored on the SD card. The Pi could be placed into its own vLAN. Probably other stuff as well – have I mentioned I’m not a security guru? I’d love to hear suggestions on how to improve upon this! But I’ve determined that this is good enough, for me, for my risk model, for now.

Let’s begin…

Prepare Raspberry Pi’s OS

  1. Install the Raspberry Pi OS onto an SD card using your method of choice. For this guide, I used 2021-01-11-raspios-buster-armhf-lite.zip.
  2. Boot the Pi from the SD card, logging in as the user “pi” with password “raspberry”.
  3. Immediately change the pi user’s password:
  1. Update the Pi:
  1. Configure the Pi, specifically the Locale, Timezone, and WLAN Country, all located underneath Localization options.
  1. Set the hostname:
  1. Optionally, enable and start SSH:
  1. Disable the swap file because swapping over the network seems like a pretty bad idea (although it might still behave better than many SD cards…)
  1. Verify that USB booting is enabled.
  1. Edit /boot/config.txt.
    1. If USB isn’t enabled (the output doesn’t match above), then add program_usb_boot_mode=1 to the file.
    2. Optionally, while we’re here, disable things you won’t be needing. I personally disabled audio (comment out dtparam=audio=on), WiFi (dtoverlay=disable-wifi), and Bluetooth (dtoverlay=disable-bt).
  1. Reboot
  1. If USB booting wasn’t enabled, verify that it is now enabled and then remove the line from /boot/config.txt.
  2. Verify that swap is disabled and then remove the old swap file
  1. Discover the Pi’s serial number, and take note of this for later. My Pi’s serial happens to be fb5d1ece. I’ve set my Pi’s hostname accordingly to pi-fb5d1ece.
  1. Lastly, while we’re doing stuff on the Pi, we’ll install open-iscsi to discover the “initiator name”. Take note of this for later.
  1. Optionally, overwrite the default iSCSI “initiator name”:

Sources:

Configure Networking

Configure DHCP TFTP

Pi Network Docker

We need a way to respond to the Pi via DHCP with the IP of the TFTP server we’ll be setting up later that will host our /boot directory. There are a number of ways to do this that may vary based on your setup. Many of the guides out there will mention having to setup a Raspberry Pi Boot option, however, I didn’t find this necessary anymore. I presume that this has been fixed with more recent bootcode.bin‘s.

  • Option 1 – Setup your DHCP server to respond with the IP of wherever you’ll be hosting the TFTP container.
    • My network is running Ubiquiti’s UniFi gear, so this was as easy as setting this in Settings > Networks > LAN (Edit) > Advanced DHCP Options > DHCP TFTP Server.
    • This post outlines how to configure isc-dhcp, if you’re running that on your network.
  • Option 2 – Setup dnsmasq. This should be agnostic of what DHCP server you’re running and is the “sure-fire” method. The official documentation and this post both explain how to do this, although not in Docker (I’d bet dnsmasq images exist though).

Optionally, set Static IP for the Pi

While this isn’t required, I find that I prefer to have a Static IP for the Pi. This way, it’s easier to SSH into and I’m able to lock down the NFS and iSCSI shares a little tighter in the following sections.

Note that if you don’t set a Static IP, your Pi might obtain two separate DHCP leases. This post discusses how to resolve that.

Once again, since I’m running UniFi gear, this was as easy as setting a static IP for the Pi. Yes, this means I’m using DHCP to assign an IP to the Pi, but since it’s static, I avoid the issue of duplicate leases.

Network

Snapshot the host

Before installing a bunch of things into the Proxmox host, since I’m using ZFS, I can take a snapshot of the host, should anything go haywire and I need to rollback. Optionally, all the VMs on the host can be stopped prior to taking the snapshot to get a more accurate image. I didn’t bother with this.

Optionally, update the host

While I’m messing with the system install, I figured I might as well install any updates.

Create ZFS datasets

We’re going to create two datasets: one that will serve the necessary boot files for booting via TFTP and for mounting and updating once booted via NFS and one that will serve as the root volume. The root volume can be any desired size or name (it doesn’t have to match the Pi’s serial number).

Optionally (but recommended), create a filesystem specifically for our Pi (using your Pi’s serial number). I recommend setting a quote for this filesystem as well because it will be easily writeable via NFS, so it’s nice to constrain its growth.

Alternatively, LVM could probably be used. But I already have ZFS and I think its features are neat so I’ll be using that.

Sources

Install and setup NFS on the host

I’ll be using the kernel’s NFS implementation rather than ZFS’s NFS implementation. I don’t doubt that ZFS’s implementation is suitable for this purpose, however, most of the existing guides use NFS and I was already planning on using NFS for non-ZFS data in the future anyways.

First, we’ll make the directories that we’ll serve NFS out of. Then we’ll setup a “bind mount” from where the ZFS dataset is mounted to the directory we just created. Update January 30, 2021 – The fstab entry’s options were changed to be rbind instead of bind (in case separate boot datasets were created) and to wait for the ZFS datasets to be mounted.

Now that the directory is mounted, we’ll create the directory for our particular Pi’s boot data to live in. Whenever a Pi network boots, it loads bootcode.bin from the root of the TFTP share, then searches for the remaining files in a directory with its serial number before finally looking in the root. We’ll create a directory for our particular Pi using the serial number we noted above:

Now that our directories are in place, we can install the NFS server:

We’ll edit /etc/exports to expose the two shares. The first share gives read-only access to the entirety of the TFTP share to the IP address of the VM that’s running Docker. This will be needed when we setup the TFTP Docker container. The container only needs serve the files, so we only give it read-only access. The second share gives read/write access to our particular Pi. Our Pi will mount this as its /boot directory so that any updates will get persisted. This assumes that both the Docker VM and Pi have static IPs. If this isn’t the case, a simpler configuration that simply provides read/write access to the entirety of the share would suffice. It’s imporant to note that the no_root_squash option is extremely insecure because this allows anybody to write files onto the host as root. However, we have this pretty well constrained to just this boot directory, so the risk seems minimal. Update January 30, 2021 – Added the crossmnt option to the parent netboot share (in case separate boot datasets were created).

Docker

Finally, we’ll refresh NFS with its new configuration:

Sources:

Install iSCSI on the host

First we’ll install iSCSI:

Next, we’ll try to enable and start iSCSI, but this will likely fail because Debian (which Proxmox is based off of) doesn’t ship a systemd unit file for some reason(?).

Assuming that worked, the rest of this section can be skipped. If it didn’t work, we’ll have to create the systemd unit file.

Create the file /lib/systemd/system/target.service:

Then, copy the file, mark it as executable, and attempt to enable and start again:

Sources:

Setup iSCSI on the host

Next we’ll set up the iSCSI “target” on the host. In iSCSI terms, the “client” is the “initiator” and the “server” is the “target.

First we’ll create the backing store. This can be named whatever you’d like, I named mine consistently with my Pi’s hostname (and the name of the ZFS volume).

Next, we’ll create the target and cd into it:

We’ll map to the backing store:

Then we’ll create an ACL. We’ll be using the “initiator name” that we noted above.

Lastly, we’ll confirm the entire configuration:

You should see something like this:

Finally, save and quit:

Sources:

Setup iSCSI “initiator” on the Pi

In order to be able to connect to your iSCSI drive during the boot, you’ll need to load an initrd image with the required module.

First, tell the initramfs tool to include the iscsi module by creating the required flag file and create the initramfs image for the current kernel:

The new initrd can be found in /boot:

Docker

We’ll need to edit the iSCSI configuration file so that the module can successfully load:

Then reboot:

After rebooting, confirm that the modules loaded successfully:

Next, discover all the iSCSI targets available:

Consider rebooting the server to ensure the iSCSI targets persist.

Next, mount the target:

Confirm that the target is mounted and take note of the dev entry (probably /dev/sda).

Sources:

Copy Pi installation to target

On the Pi, format the iSCSI target:

Take note of new drive’s UUID, we’ll be using this later

Mount the iSCSI target:

Copy the Pi installation to the iSCSI target, excluding system directories, and then make new system directories:

Finally, we need to fix the fstab on the iSCSI target, otherwise when we do finally network boot the Pi will try to mount the SD card:

Sources:

Setup TFTP Docker container

In the Docker VM, add the following to your docker-compose file:

Pull and start the container:

Verify that the NFS mount worked correctly:

Take note that if you need to change the configuration for the NFS volume in the future that simply changing in the compose file will not apply your changes. You will instead need to docker rm it from the container (or docker rm the container first) and then docker rm the volume).

Sources:

Prepare TFTP with the Pi’s /boot

On the host, copy the /boot directory from the Pi and cd into the directory:

Modify the config.txt, to use the use the initramfs image we prepared earlier that contains the iSCSI module:

Now we’ll modify the cmdline.txt:

Finally, remember that the Pi looks in the root for bootcode.bin. We’ll create a symbolic link from the rot to our specific Pi’s bootcode.bin. This way, if the Pi updates the bootcode.bin in its /boot directory, it’ll boot with the updated file the next time. This probably isn’t super ideal if you have a bunch of Pi’s (especially if they’re different versions), so try to keep the Pi’s roughly on the same versions. As far as I’m aware, the Raspberry Pi 4 doesn’t use bootcode.bin, so this is only a problem for older Pi’s.

It might be wise to create a backup or a ZFS snapshot of the boot directory at this point, just in case a future update breaks things.

Sources:

  • https://stuff.drkn.ninja/post/2016/11/08/Net-boot-(PXE-iSCSI)-with-a-RaspberryPi-3

The moment of truth!

Now it’s time to boot the Pi! Shutdown the Pi:

Once the Pi powers down, remove the SD card, power it back on (unplug and replug the power), and cross your fingers!

The Pi will take a while (minute-ish) to come up. If it doesn’t, proceed to the debugging section below…

Debugging

My Pi didn’t originally startup, so I have some limited experience with debugging.

If the Pi sits at a black screen (like mine) and never shows the rainbow splash screen, this means it isn’t loading the bootcode.bin properly. On any machine on your network, run tcpdump -vv -i <eth0> port 67 or port 68 or port 69, reboot the Pi, and examine the output. If you don’t see any output, then the Pi isn’t discovering the TFTP server via DHCP correctly. If you do see the output, then your TFTP server likely isn’t setup properly. Try to login to the TFTP server and get bootcode.bin. Originally I had setup my symbolic link to be an absolute path instead of a relative path, which didn’t work over NFS.

This reaches the extent of my debugging so far. For more ideas, check the sources below.

Sources:

Installing Docker on the Pi

Docker was the whole reason for using iSCSI over NFS (for me at least). Fortunately, installing Docker and docker-compose is simple!

Sources:

Kernel updates

I haven’t actually done a kernel update yet, so these instructions are just borrowed from my sources. After doing an apt dist-upgrade, the initramfs image should be created. If not, this can be created similar to above except by specifying the new kernel version instead of using uname. Then, update the config.txt to point at the new initramfs image, reboot, and cross your fingers!

Sources:

Updates

Raspberry Pi Network Boot Docker

  • January 30, 2021 – Fixes to support multiple netboot datasets.