Homelab Evolution - Automating Infrastructure with Terraform and CloudInit
A year or so ago I dove headfirst into building a homelab powered by Proxmox; I experimented with local AI, DNS and even a little bit of home automation. My initial excitement faded as manual configurations, dependencies, and environments diverged. I found it impossible to consistently replicate anything.
To achieve reproducible, reliable deployments, I’m transitioning my homelab to use Infrastructure as Code, something I’ve been doing for nearly a decade on AWS. This shift to version-controlled workflows will prioritise innovation over maintenance.
The fragility of ClickOps
I found ClickOps unsustainable when provisioning infrastructure and configuring access control. Worse, I frequently struggle when I deploy services and tweak local configuration directly on VMs.
Every month or so I’ll want to update a service. I’d update one VM, feeling confident, only to discover shortly after that another VM had mysteriously stopped responding. I’d end up troubleshooting, chasing logs, and trying to remember where and how I configured everything. It was a constant battle between the various services and configurations across many VMs that have lasted a lot longer than perhaps they should have.
I wanted consistency and a version-controlled source of truth for all the random services—something I’m already used to in AWS. IaC could be the solution, but it requires tooling support and a shift in mindset. I need to set aside the haphazard approach I’ve used so far and plan a bit more up front.
Why Infrastructure as Code
Manually provisioning infrastructure and services limited me, highlighting the need for a different approach. Unlike ClickOps, IaC provides a more reliable iteration cycle and removes the bottlenecks caused by missing audit trails and inconsistent environments.
While the UI could handle 80% of the work, I often needed to use the terminal and CLI commands to follow procedures or step through processes. Beyond managing infrastructure, I wanted to create a rapid and reliable platform for iteration, supported by reproducible steps.
I reviewed alternatives, from standard bash scripts, community scripts, and Ansible Playbooks and figured I’d lean into stateful operations and immutable deployments.
IaC definitions on Proxmox
Terraform/OpenTofu or Pulumi
The first hurdle to overcome is that Proxmox doesn’t have any first-party IaC tool support. Thankfully there are some open-source Terraform / OpenTofu providers, with the most supported and up-to-date being from BPG. That same provider also underpins a community Pulumi integration, which lets you use Python/TypeScript instead of a DSL.
Creating VM Templates
Creating a VM through IaC has multiple stages:
- All artefacts are imported onto the host filesystem; the cloud image is downloaded and the cloud-init templates uploaded.
- Proxmox then processes the cloud image into a VM disk image, configures network, memory, and CPU allocations.
- Once initial creation is complete, it unlocks the VM reference to be further edited or so you can boot it.
VM Templates offer a simple way to skip over the first step, which is typically the longest. It then allows for “default” configurations to be inherited.
I’ve personally found some open-source repositories limit bandwidth per client, so a VM was created and booted within 10-20 seconds, but took 10 minutes to download the image.
With IaC, I iterate on multiple cloud images and produce consistent templates, then catalogue them much like public AMIs, or container images in ECR. From there, I either clone templates manually for experiments or reference them from other deployments.
Cloud-Init: Configuring VM runtime
cloud-init runs on first boot and will configure running systems such as users, networking, and packages. Proxmox’s UI has basic support such as Users and IP Config, but IaC can use cloud-init to the fullest extent like any other cloud provider. This can turn a cloud-based Linux distribution into a fully operational service in a single deployment without any manual intervention.
Abstracting logical groups into modules
A powerful use case of any IaC definition is the ability to extract snippets into reusable modules. Once the correct configuration has been deployed and tested, creating an abstraction helps alleviate duplication via common onboarding and maintainability. Rolling up logical groups of infrastructure transforms the mental model from individual hardware to higher level services; load balancers, K8s nodes, auto-scaling groups.
Proxmox IaC in my Homelab
I’ve built my IaC approach around a catalog of templates that let me spin up VMs at a moment’s notice—fast boots, predictable defaults, and minimal hand-holding.
For traditional VMs, I rely on cloud images. I keep the network config and user_data in versioned files, then let cloud-init handle the first-boot setup so I don’t have to babysit installs.
I currently run most services in Docker, but I want to explore Kubernetes as a next step for orchestration—especially as the number of containers (and hosts) grows.
A few machines still get the traditional ISO-boot treatment: wipe the disk, install manually, and move on. That’s fine for my local cloud-gaming box or a dedicated dev machine, but if I start doing it more often, I’ll look at baking images—ideally with cloud-init integrated—so it stays repeatable instead of turning into a one-off process.
LXC is the odd one out. It doesn’t give me the same “drop in user_data and go” workflow, so if I start leaning on LXC more heavily, I’ll probably revisit tools like Ansible to fill that gap.
Conclusion
I’ve already started migrating my templates and I’ve been kicking the tires on cloud images with a few small, low-risk deployments.
Long term, I want this model to run everything, but I’m not going to force IaC onto platforms that fight me or don’t support it cleanly. I’d rather automate where it pays off and stay pragmatic everywhere else.
My cloud-gaming setup is the clearest example. I barely need Windows, so I’m even testing whether I can replace it with a Linux-based system and keep the whole stack closer to the workflow I’m building.