In which I build a custom backup solution with rsync and a LaunchDaemon
I use a Mac Mini as a file server and hypervisor. This very site is running in a Docker container inside VirtualBox on the aforementioned Mac Mini.
There's a 6tb drive mounted to the Mac Mini on which many "important" files live. In the past I periodically copied them over, but computers are supposed to do this stuff automatically. That's like the whole point or whatever.
The tools we'll use are rsync and launchd. rsync is the classic tool commonly found on Unix-like operating systems, used for synchronizing files between two locations. It's robust, powerful, and popular. launchd in macOS is a framework that can can manage daemons/processes/applications/etc. We'll use launchd in a manner similar to cron.
First, let's get rsync going. We have three physical volumes:
To make a very simple backup, you could simply copy everything like this:
# cp -r /Volumes/Shared/ /Volumes/Backup/
but this kinda sucks because ALL files are copied which takes forever and will fail if you sneeze too hard.
Enter rsync. With rsync, we have a lot of interesting options. There are many other websites that probably explain rsync much better, so we'll just worry about what we need for our purposes.
Starting with pseudocode is helpful when doing anything with computers aside from checking your social media accounts for trash memes. Here's what we want to accomplish with our tools:
Every day at 3am:
The first sync is going to take forever, but subsequent syncs will be much faster because most of the files will already exist on the destination - We're only syncing the things that have changed since the last sync.
You might be thinking "Isn't this basically Dropbox or Time Machine?" Yup, you're right it's very similar. However, Dropbox costs cash money per GB and I have a ton of data to backup. Also, I can't use Time Machine because that volume is already being used as a Time Machine destination for our laptops and I got some weird error about the wrong filesystem when I tried to enable it. I couldn't figure out how to enable Time Machine for this server on its own attached external volume, while also sharing it on the network.
Anyway here's the rsync commands we'll use for our backups:
# rsync -arv /Volumes/Media/ /Volumes/Backup/Media/ --exclude=".*" --delete --ignore-errors # rsync -arv /Volumes/Shared/ /Volumes/Backup/Shared/ --exclude=".*" --delete --ignore-errors
To break down the options we need:
-a: Archive: Preserve permissions, ownership, etc
-r: Recursive: Also get all the subfolders and their content
-v: Verbose: Get more verbose output
--exclude=".*": macOS uses hidden files for things like Spotlight, Trash, Time Machine, and a bunch of other crap. We don't need to back those up since macOS creates them as needed. Also we don't really have permission cos I think ths system owns it all. Hidden filenames and folder names begin with a "."
--delete: Delete items FROM the destination that DO NOT exist on the source. So If I delete "catvid.mp4" because it sucked, it will also be deleted from the backup.
--ignore-errors: This volume has files over a decade old, so there are bound to be some corrupt files. I just skip over them because I can't open them anyway, and there's no reason to fail the rest of the backup just for a few random errors. I'll clean this volume up eventually I swear.
So that's easy enough - We're just syncing files from the network share to the backup. Buuuuut I also want to backup my the VM that hosts this website. My webserver is typically not very busy, so it might just work to rsync the .vdi and metadata over. However if the VM is doing something, the .vdi is probably being modified, and so the backup will be probably end up corrupted. So here's some psuedocode for what we need to do:
Every day at 3am:
This does mean that this site will be offline for about 15 minutes while the backup completes (You can check for yourself), but that's ok. The server logs show very little traffic to this site, especially at 3am. There's probably a better way to do this, like instead of saving the entire VM, use Git to grab the important stuff. Maybe I'll build that later.
Here's the entire script. You'll see the vboxmanage command, which is used to control the VM, and a for loop that checks to see if the VM is safely shutdown before starting the sync. The VM is simply called "web":
# Sync Media and Shared rsync -arv /Volumes/Media/ /Volumes/Backup/server/Media/ --exclude=".*" --delete --ignore-errors rsync -arv /Volumes/Shared/ /Volumes/Backup/server/Shared/ --exclude=".*" --delete --ignore-errors # The VM part # Timeout in seconds. If the machine doesn't shut down within 5 mins, something's wrong, so skip the backup to be safe. TIMEOUT=300 # Boolean to tell us to backup or not BACKUP=true # Safely shutdown web VM with an ACPI call vboxmanage controlvm web acpipowerbutton # The loop that will run for 5 minutes for ((i=TIMEOUT; i>-1; i--)) do echo "Waiting $i more seconds for web to shutdown gracefully..." # Pause for 1 second sleep 1 # Check the running state of the VM. If it's "poweroff" then exit and and start the backup. if [[ $(VBoxManage showvminfo --machinereadable web | grep ^VMState=.poweroff.) ]]; then BACKUP=true break fi # If we've met the timeout value, then something's wrong and we should exit and skip the backup. if [[ "$i" -eq 0 ]]; then BACKUP=false break fi done if [[ "$BACKUP" == true ]]; then # Start the backup mkdir -p /Volumes/Backup/server/Storage/VirtualBox\ VMs/web/ rsync -arv /Volumes/Storage/VirtualBox\ VMs/web/ /Volumes/Backup/server/Storage/VirtualBox\ VMs/web/ --delete --ignore-errors elif [[ "$BACKUP" == false ]]; then echo "Timeout, skipping this backup." # Should also figure out how to send an alert email. fi # Start the VM back up. vboxmanage startvm web --type headless
Running this by itself seems to work`! Now, let's make it into a Daemon so it happens automatically..
First, create a .plist file in /Library/LaunchDaemons. Here's a wizard: http://launched.zerowidth.com
In the .plist file I created, you can see that I'm running the /Volumes/Backup/backup.sh script I created, as myself (fred), every day at 3am.
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> <plist version="1.0"> <dict> <key>Label</key> <string>com.fcm.backup</string> <key>ProgramArguments</key> <array> <string>/Volumes/Backup/backup.sh</string> </array> <key>UserName</key> <string>fred</string> <key>StartCalendarInterval</key> <dict> <key>Hour</key> <integer>03</integer> </dict> </dict> </plist>
Then set appropriate permissions/ownership for this .plist file (the user "fred" is a member of wheel):
# chown root:wheel /Library/LaunchDaemons/com.fcm.backup.plist # chmod a+x /Library/LaunchDaemons/com.fcm.backup.plist
Next, we need to make the script executable, fix the ownership, and set appropriate permissions:
# chown root:wheel /Volumes/Backup/backup.sh # chmod a+x /Volumes/Backup/backup.sh
Finally, let's enable the Daemon:
# launchctl -w load /Library/LaunchDaemons/com.fcm.backup.plist
That's it, we now have an automated task to run a backup script every day at 3am that is free and won't break our VM!
This is a decent solution for static files like media, but for this website, this is kind of a caveman way to do a backup as it captures the entire VM. All we really care about is the site data, If the VM exploded, it would be better and faster to rebuild the VM and container, and restore the site data. Maybe a post on that later.
Great commerical options do exist for backups such as Carbon Copy Cloner, SuperDuper, Time Machine, and Dropbox. You could certainly use one of those if you wanted, but I really like the flexibility that comes with doing it manually like this.