I woke up this morning to a text from my ISP, “There is an outage in your area, we are working to resolve the issue”
I laugh, this is what I live for! Almost all of my services are self hosted, I’m barely going to notice the difference!
Wrong.
When the internet went out, the power also went out for a few seconds. Four small computers host all of my services. Of those, one shutdown, and three rebooted. Of the three that ugly rebooted some services came back online, some didn’t.
30 minutes later, ISP sends out the text that service is back online.
2 hours later I’m still finding down services on my network.
Moral of the story: A UPS has moved to the top of the shopping list! Any suggestions??
When you are bored, backup a VM then hard kill it and see if it manage to restart properly.
Software should be able to recover from that.
If it doesn’t, troubleshoot.That reminds me of Netflix’s Chaos Monkey (basically in office hours this tool will randomly kill stuff).
IMHO you’re optimizing for the wrong thing. 100% availability is not something that’s attainable for a self-hoster without driving yourself crazy.
Like the other comment suggested, I’d rather invest time into having machines and services come back up smoothly after reboots.
That being said, an UPS may be relevant to your setup in other ways. For example it can allow a parity RAID array to shut down cleanly and reduce the risk of write holes. But that’s just one example, and an UPS is just one solution for that (others being ZFS, or non-parity RAID, or SAS/SATA controller cards with built-in battery and/or hardware RAID support etc.)
I present to you the holy hardware compatibility table:
https://networkupstools.org/stable-hcl.html
Anything not listed there is not worth buying.
A lot of stuff on there isn’t worth buying either, like anything from APC. If you want good stuff, just get Eaton.
But also you have to understand that UPSes aren’t set and forget. The batteries need replacement every 3-5 years. And they’re not for extended outages, they’re mostly to bridge the gap between mains power going out and a generator starting up.
Personally I just have everything running from docker-compose, so I run one command and everything not running gets started. I don’t worry about stuff being down for a bit.