Home Server Project
How I set up a 3-node High Availability (HA) cluster for my home servers
Architecture
The hypervisor used is Proxmox, a free and open source tool used to link several independent Linux servers together for HA, seamless migration, and monitoring.
flowchart TB
subgraph PROXMOX["PROXMOX CLUSTER"]
N1["Node 1<br/>Docker Containers<br/>━━━━━━━━<br/>Homebridge<br/>Scrypted (Ring)<br/>so-co (Sonos)"]
N2["Node 2<br/>DNS Services"]
N3["Node 3<br/>Virtual Machines"]
end
GPU["GPU Server<br/>Ollama API<br/>LLM Inference"]
N1 -->|API Calls| GPU
N1 -->|Passthrough| AppleHome[Apple Home]
style GPU fill:#09090b,stroke:#06b6d4
style PROXMOX fill:#09090b,stroke:#27272a
Setup
I have 3 modified Dell OptiPlex Micro computers with Intel i5-10500T processors and 16 GB of RAM. The “T” variant is specifically the low-power version designed for efficiency, so it draws significantly less than the standard i5-10500 (65W TDP). This was a huge bonus for my use case since raw processing is not currently a bottleneck, and energy costs was a consideration when designing this system.
In testing, the minimum power draw was ~4 W on Node 1. The maximum power draw was on Node 3 due to the VM overhead, which was 24 W.
I also have a separate GPU server (not part of the Proxmox cluster) equipped with an RTX 3070 running Ollama for LLM inference. The Proxmox nodes can make API calls to it when needed, but it’s just a standalone machine on the network—no hypervisor, no clustering, just Ubuntu with Ollama installed.
Since the Node 3 VM is running Windows 10 Pro, the typical power draw would be more than double that.
Proxmox Installation
To use Proxmox, you need to image each node with the Proxmox VE ISO. The process is straightforward:
- Download the latest Proxmox VE ISO from their website
- Flash it to a USB drive using something like Rufus or dd
- Boot each node from the USB and follow the installer
The installer handles partitioning and sets up the base Debian system with the Proxmox packages. I configured static IPs for each node during installation to keep networking predictable.
Once all three nodes were running Proxmox individually, I created the cluster from Node 1’s web interface and then joined the other nodes using their join commands. Proxmox uses Corosync for cluster communication, so all nodes need to be on the same network with low latency between them.
With 3 nodes, quorum is straightforward—you need 2 out of 3 nodes to maintain cluster operations. This is way cleaner than dealing with even-numbered clusters.
Networking
All three Proxmox nodes are connected to a single gigabit switch, which keeps things simple. I’m running everything on a flat network for now—no VLANs or separate management network. Each node has a static IP assigned in the 192.168.1.x range.
For the cluster to work properly, all nodes need to communicate over the same network interface. I made sure to use the primary ethernet port on each machine and disabled any secondary NICs to avoid confusion.
DNS is handled by Node 2 running Pi-hole in a container, which also gives me ad-blocking across the entire network. I’m using AdGuard Sync to replicate the DNS config, so if Node 2 goes down, DNS automatically fails over without relying on Proxmox HA. It’s a nice layer of redundancy that works independently of the cluster. The smart home containers on Node 1 need stable networking for HomeKit integration, so those are bridged directly to the physical network rather than using NAT.
The GPU server is just another device on the same network. Node 1 has a so-co (Sonos CLI) container that hits the Ollama API for some LLM-powered Sonos control, but otherwise the GPU server is treated like any other network service.
Key Features
HA Failover
High availability is one of the main reasons to run Proxmox in a cluster. If configured properly, VMs and containers can automatically restart on other nodes if their host goes down. I have HA enabled for critical services like the smart home stack.
There’s some overhead to this—you need shared storage or replication configured, and not every workload benefits from automatic failover. For example, the Windows 10 VM on Node 3 is just a legacy image I keep around for an old app, so I don’t have HA enabled for it.
Interestingly, DNS doesn’t even rely on Proxmox HA. AdGuard Sync handles automatic failover between DNS instances, so it’s redundant at the application layer rather than the hypervisor layer. Sometimes the simplest solution is the best one.
Live Migration
Being able to move running VMs between nodes without downtime is incredibly useful for maintenance. I’ve used this to patch and reboot nodes without interrupting services. The migration happens over the network, so it’s not instant for larger VMs, but for containers it’s nearly seamless.
Live migration requires shared storage or replicated storage to work. I went with local storage and replication, which adds some complexity but keeps costs down since I didn’t need to buy a NAS.
Centralized Management
Having a single web interface to manage all four nodes is a game changer. I can monitor resource usage, check logs, and spin up new VMs or containers from anywhere on the network. The Proxmox web UI is surprisingly good for something that’s free and open source.
The API is also solid if you want to automate things. I haven’t gone too deep into automation yet, but it’s nice knowing the option is there.
Challenges and Lessons Learned
Quorum benefits with 3 nodes: Having an odd number of nodes (3) makes quorum management way simpler. You need 2 out of 3 nodes to maintain cluster operations, which means you can lose one node and still stay online. If I’d gone with 2 or 4 nodes, split-brain scenarios would be more of a headache.
Storage replication overhead: Keeping VM disks replicated across nodes for HA adds I/O overhead and eats into storage capacity. I had to be selective about what actually needs HA versus what can tolerate some downtime. The Windows 10 VM, for instance, is single-node only.
Power consumption vs. performance trade-offs: The i5-10500T CPUs hit the sweet spot for this use case. They’re not built for heavy compute workloads, but that’s not what I need—smart home automation, DNS, and light VMs don’t hammer the CPU. What they do provide is excellent efficiency, which matters more when you’re running 24/7. The power savings add up over time.
Container vs. VM overhead: For services that don’t need full VMs, LXC containers are way more efficient. The DNS service on Node 2 runs in a container and uses a fraction of the resources a VM would need. I wish I’d started with more containers instead of defaulting to VMs.
Network dependency: The entire cluster depends on the network being stable. When I had a switch flake out once, the cluster lost quorum and things got messy. Having reliable network hardware is non-negotiable for this kind of setup.
Keeping the GPU server separate was the right call: I initially considered adding the GPU server to the Proxmox cluster, but honestly it would’ve been overkill. The machine just runs Ollama and serves API requests—it doesn’t need HA or live migration. Keeping it as a simple Ubuntu box made setup way easier.
Overall, the cluster has been running stable for months now. The flexibility and redundancy are worth the setup complexity, and I’ve learned a ton about virtualization and clustering in the process.