System Architecture#
Overview#
The homelab is a multi-server infrastructure for local AI inference, containerized services, and resilient storage. This page provides a holistic view of how all components interconnect.
Physical Infrastructure#
graph TB
subgraph "Hydrogen (NUC)"
H_CPU[Intel NUC<br/>Router/DNS/Traefik]
H_NET[10GbE Switch]
H_CPU --- H_NET
end
subgraph "Helium Server"
HE_CPU[Ryzen 7 9700X<br/>8-core + 64GB RAM]
HE_GPU1[Quadro RTX 5000<br/>16GB VRAM Turing]
HE_GPU2[Quadro RTX 5000<br/>16GB VRAM Turing]
HE_NVMe[2TB NVMe]
HE_CPU --- HE_GPU1
HE_CPU --- HE_GPU2
HE_CPU --- HE_NVMe
end
subgraph "Lithium Server"
L_CPU[M3 Ultra<br/>96GB Unified Memory]
L_NVMe[1TB NVMe]
L_CPU --- L_NVMe
H_NET -->|10GbE| HE_CPU
H_NET -->|10GbE| L_CPU
H_NET --> UNIFI[UniFi Gateway]
H_NET --> PIHOLE[Pi-hole DNS]
style Hydrogen fill:#e1f5ff
style Helium fill:#fff4e1
style Lithium fill:#f0f0f0
Server Roles#
| Server | Codename | Primary Role | Hardware |
|---|
| Hydrogen | hydrogen | Network Gateway/Router | Intel NUC (traefik, pinhole, DNS) |
| Helium | helium | Container Services + Inference | Ryzen 7 9700X + 2x Quadro RTX 5000 (Turing) + 64GB RAM |
| Lithium | lithium | High-Capacity Inference | M3 Ultra + 96GB Unified Memory |
Network Architecture#
graph TB
subgraph "VLAN 10 - Management"
M1[10.0.10.10<br/>Pi-hole]
M2[10.0.10.20<br/>helium.mrzk.io]
M3[10.0.10.30<br/>lithium.mrzk.io]
M4[10.0.10.1<br/>hydrogen.mrzk.io/UniFi Gateway]
end
subgraph "VLAN 20 - Workstations"
W1[10.0.20.x<br/>Developer Machines]
end
subgraph "VLAN 30 - IoT"
I1[10.0.30.x<br/>Smart Home]
end
subgraph "VLAN 40 - Guests"
G1[10.0.40.x<br/>Guest Wi-Fi]
end
INTERNET[Internet] --> UNIFI[UniFi Gateway]
UNIFI --> VLAN10
UNIFI --> VLAN20
UNIFI --> VLAN30
UNIFI --> VLAN40
VLAN20 -.->|22,80,443| VLAN10
VLAN30 -.->|80,443| INTERNET
VLAN40 -.->|80,443| INTERNET
style VLAN10 fill:#e8f5e9
style VLAN20 fill:#e3f2fd
style VLAN30 fill:#fff3e0
style VLAN40 fill:#fce4ec
DNS Configuration#
mrzk.io (local domain)
├── hydrogen.mrzk.io → 10.0.10.1 (Router/Traefik)
├── helium.mrzk.io → 10.0.10.20 (Container host + RTX 5000s)
├── lithium.mrzk.io → 10.0.10.30 (M3 Ultra inference)
└── traefik.mrzk.io → External HTTPS proxy
Storage Architecture#
graph LR
subgraph "Real-time Sync (Syncthing)"
H[hydrogen.mrzk.io<br/>2TB NVMe]
HE[helium.mrzk.io<br/>2TB NVMe]
L[lithium.mrzk.io<br/>1TB NVMe]
H <-->|/code| HE
H <-->|/configs| L
HE <-->|/models| L
end
subgraph "Cold Backup"
TB[Thunderbolt NVMe<br/>Weekly Images]
HE -.->|dd backup| TB
end
subgraph "Future: MergerFS + SnapRAID"
P1[Drive 1]
P2[Drive 2]
P3[Drive 3]
PARITY[Parity Drive]
P1 -.-> POOL[(MergerFS Pool)]
P2 -.-> POOL
P3 -.-> POOL
POOL -.-> PARITY
end
style H fill:#e1f5ff
style HE fill:#fff4e1
style L fill:#f0f0f0
style TB fill:#ffebee
style POOL fill:#e8f5e9
Data Flow#
- Active Development:
/code syncs across all 3 servers via Syncthing - Model Storage:
/models on helium/lithium for local inference - Config Backup:
/configs real-time + weekly full disk images - Future Scaling: MergerFS pool + SnapRAID parity when 3+ drives available
Inference Stack#
graph TB
subgraph "Client Layer"
DISCORD[Discord Bot]
WEB[Web Interface]
API[API Clients]
end
subgraph "Load Balancer"
TRAEFIK[Traefik Reverse Proxy<br/>traefik.mrzk.io:443<br/>Hydrogen]
end
subgraph "Inference Layer"
LLAMA1[llama.cpp Server<br/>lithium.mrzk.io:8080<br/>Qwen3.5-122B A10B]
LLAMA2[llama.cpp Server<br/>helium.mrzk.io:8080<br/>Gemma4-26B Q4_0]
end
subgraph "GPU Resources"
GPU1[Quadro RTX 5000 #1<br/>16GB VRAM Turing]
GPU2[Quadro RTX 5000 #2<br/>16GB VRAM Turing]
GPU3[M3 Ultra<br/>96GB Unified Memory]
end
DISCORD --> TRAEFIK
WEB --> TRAEFIK
API --> TRAEFIK
TRAEFIK --> LLAMA1
TRAEFIK --> LLAMA2
LLAMA1 --- GPU1
LLAMA1 --- GPU2
LLAMA2 --- GPU3
style TRAEFIK fill:#fff3e0
style LLAMA1 fill:#e8f5e9
style LLAMA2 fill:#e8f5e9
style GPU1 fill:#fce4ec
style GPU2 fill:#fce4ec
style GPU3 fill:#f3e5f5
Model Registry#
| Model | Quantization | Size | Purpose | Location |
|---|
| Qwen3.5-122B | A10B | 80GB | Main inference | lithium (M3 Ultra) |
| Gemma4-26B-A4B | Q4_0 | 19GB | Fallback | helium (RTX 5000s) |
| nomic-embed-text-v1.5 | - | 274MB | Embeddings | All servers |
# High VRAM (16GB per GPU - helium)
n_gpu_layers: 50-60
n_ctx: 8192
n_batch: 512
flash_attn: true
# Unified Memory (96GB - lithium)
n_gpu_layers: 999
n_ctx: 16384
n_batch: 1024
Container Deployment Architecture#
graph TB
subgraph "helium Docker Host"
subgraph "Network Stack"
TRAEFIK[Traefik<br/>80/443]
PIHOLE[Pi-hole<br/>53]
end
subgraph "Authentication"
AUTH[authentik<br/>9000/9443]
REDIS[Redis]
POSTGRES[PostgreSQL]
AUTH --> REDIS
AUTH --> POSTGRES
end
TRAEFIK --> DRONE[Drone CI]
TRAEFIK --> PORTAINER[Portainer]
TRAEFIK --> GRAFANA
TRAEFIK --> UNIFI
end
INTERNET[External] --> TRAEFIK
LOCAL[Local Network] --> TRAEFIK
LOCAL --> PIHOLE
style TRAEFIK fill:#fff3e0
style DRONE fill:#f3e5f5
style PORTAINER fill:#e8f5e9
style PROM fill:#fff8e1
style UNIFI fill:#e0f2f1
Service Discovery#
Security Model#
graph TB
INTERNET[Internet]
CLOUDflare[Cloudflare]
HYDROGEN[hydrogen.mrzk.io<br/>Intel NUC<br/>Router/Traefik/DNS]
subgraph "DMZ"
TRAEFIK[Traefik<br/>Public TLS]
end
subgraph "Internal VLANs"
MGMT[Management VLAN 10]
WORK[Workstation VLAN 20]
IoT[IoT VLAN 30]
end
subgraph "Security Layers"
WAF[WAF Rules]
AUTH[authentik SSO]
FW[Firewall]
end
INTERNET --> CLOUDflare
CLOUDflare --> TRAEFIK
TRAEFIK --> HYDROGEN
HYDROGEN --> MGMT
WORK -->|22,80,443| MGMT
IoT -->|80,443| INTERNET
MGMT --> AUTH
AUTH --> WAF
WAF --> FW
style INTERNET fill:#ffebee
style TRAEFIK fill:#fff3e0
style MGMT fill:#e8f5e9
style AUTH fill:#e3f2fd
style FW fill:#fce4ec
Access Control#
- External: Cloudflare → Traefik (TLS termination on hydrogen)
- Authentication: authentik SSO for all internal services
- Network: UniFi firewall denies inter-VLAN by default
- IoT: Outbound-only (no inbound connections)
- Management: SSH access from workstations VLAN only
Backup & Recovery Strategy#
graph LR
subgraph "Real-time Protection"
SYNCTHING[Syncthing<br/>Peer-to-peer sync]
end
subgraph "Scheduled Backups"
SNAPRAID[SnapRAID<br/>Nightly sync]
COLD[Thunderbolt<br/>Weekly images]
CLOUD[Backblaze B2<br/>Continuous]
end
subgraph "Recovery Points"
RP1[Last Sync]
RP2[Last Parity]
RP3[4-week Rotation]
RP4[Unlimited]
end
SYNCTHING --> RP1
SNAPRAID --> RP2
COLD --> RP3
CLOUD --> RP4
style SYNCTHING fill:#e8f5e9
style SNAPRAID fill:#fff3e0
style COLD fill:#ffebee
style CLOUD fill:#e3f2fd
RTO/RPO Targets#
| Data Type | Recovery Time | Recovery Point | Method |
|---|
| Code/Configs | < 1 hour | < 5 min | Syncthing |
| Container Data | < 4 hours | < 24 hours | SnapRAID + Docker volumes |
| Full System | < 24 hours | < 7 days | Thunderbolt images |
| Critical Data | < 1 hour | < 1 hour | Backblaze B2 |
Deployment Workflow#
flowchart TD
DEV[Local Development] -->|git push| GitHub[GitHub Repository]
GitHub -->|webhook| CI[CI/CD Pipeline]
subgraph "Deployment Targets"
HELIUM[Deploy to helium]
LITHIUM[Deploy to lithium]
end
CI --> HELIUM
CI --> LITHIUM
HELIUM --> TRAEFIK[Traefik Auto-Discovery]
LITHIUM --> TRAEFIK
TRAEFIK --> MONITOR[Prometheus/Grafana]
MONITOR -->|alerts| DISCORD[Discord Notifications]
style GitHub fill:#f3e5f5
style CI fill:#fff3e0
style HELIUM fill:#e8f5e9
style LITHIUM fill:#e8f5e9
style TRAEFIK fill:#fff8e1Monitoring Stack#
- Prometheus: Metrics collection (token generation, GPU usage, request rates)
- Grafana: Visualization dashboards
- Discord Integration: Alert notifications
- LLM Server Metrics:
/metrics endpoint on port 8081
Future Roadmap#
- Storage: Deploy MergerFS + SnapRAID when 3+ drives available
- Kubernetes: Migrate from Docker Compose to K3s for orchestration
- GPU Pooling: Implement GPU load balancing across helium and lithium
- Observability: Add distributed tracing with Jaeger
- Disaster Recovery: Automated cross-site replication
DevOps & Infrastructure Stack#
- code repo: GitHub (helium) - Git hosting with built-in CI/CD, 4GB RAM reserved, ports 8081/8444/2221
- ci/cd: GitHub CI/CD - Native pipelines, auto-deployment via webhooks, shared runners on helium
- auth: Authentik (helium) - SSO provider with Postgres+Redis backend, ports 9000/9443
- monitoring: Prometheus + Grafana (helium) - Metrics collection and visualization, ports 9090/3001
- network: UniFi Controller (helium) - Network management and VLAN configuration, port 8443
- storage: MinIO (helium) - S3-compatible object storage for backups and media, ports 9000/9001
- paaS: Coolify (helium, optional) - App deployment automation, port 8000
- reverse proxy: Traefik (hydrogen) - HTTPS termination, auto-service discovery, ports 80/443
- dns: Pi-hole (hydrogen) - Network-wide ad blocking and custom DNS, port 53
- security: Fail2Ban (hydrogen) - Intrusion detection and IP banning
- inference: llama.cpp (helium + lithium) - GPU inference on RTX 5000s, CPU fallback on M3 Ultra, port 8080