System Architecture#

Overview#

The homelab is a multi-server infrastructure for local AI inference, containerized services, and resilient storage. This page provides a holistic view of how all components interconnect.

Physical Infrastructure#

graph TB
    subgraph "Hydrogen (NUC)"
        H_CPU[Intel NUC<br/>Router/DNS/Traefik]
        H_NET[10GbE Switch]
        H_CPU --- H_NET
    end

    subgraph "Helium Server"
        HE_CPU[Ryzen 7 9700X<br/>8-core + 64GB RAM]
        HE_GPU1[Quadro RTX 5000<br/>16GB VRAM Turing]
        HE_GPU2[Quadro RTX 5000<br/>16GB VRAM Turing]
        HE_NVMe[2TB NVMe]
        HE_CPU --- HE_GPU1
        HE_CPU --- HE_GPU2
        HE_CPU --- HE_NVMe
    end

    subgraph "Lithium Server"
        L_CPU[M3 Ultra<br/>96GB Unified Memory]
        L_NVMe[1TB NVMe]
        L_CPU --- L_NVMe

    H_NET -->|10GbE| HE_CPU
    H_NET -->|10GbE| L_CPU
    H_NET --> UNIFI[UniFi Gateway]
    H_NET --> PIHOLE[Pi-hole DNS]
    
    style Hydrogen fill:#e1f5ff
    style Helium fill:#fff4e1
    style Lithium fill:#f0f0f0

Server Roles#

ServerCodenamePrimary RoleHardware
HydrogenhydrogenNetwork Gateway/RouterIntel NUC (traefik, pinhole, DNS)
HeliumheliumContainer Services + InferenceRyzen 7 9700X + 2x Quadro RTX 5000 (Turing) + 64GB RAM
LithiumlithiumHigh-Capacity InferenceM3 Ultra + 96GB Unified Memory

Network Architecture#

graph TB
    subgraph "VLAN 10 - Management"
        M1[10.0.10.10<br/>Pi-hole]
        M2[10.0.10.20<br/>helium.mrzk.io]
        M3[10.0.10.30<br/>lithium.mrzk.io]
        M4[10.0.10.1<br/>hydrogen.mrzk.io/UniFi Gateway]
    end

    subgraph "VLAN 20 - Workstations"
        W1[10.0.20.x<br/>Developer Machines]
    end

    subgraph "VLAN 30 - IoT"
        I1[10.0.30.x<br/>Smart Home]
    end

    subgraph "VLAN 40 - Guests"
        G1[10.0.40.x<br/>Guest Wi-Fi]
    end

    INTERNET[Internet] --> UNIFI[UniFi Gateway]
    UNIFI --> VLAN10
    UNIFI --> VLAN20
    UNIFI --> VLAN30
    UNIFI --> VLAN40

    VLAN20 -.->|22,80,443| VLAN10
    VLAN30 -.->|80,443| INTERNET
    VLAN40 -.->|80,443| INTERNET

    style VLAN10 fill:#e8f5e9
    style VLAN20 fill:#e3f2fd
    style VLAN30 fill:#fff3e0
    style VLAN40 fill:#fce4ec

DNS Configuration#

mrzk.io (local domain)
├── hydrogen.mrzk.io → 10.0.10.1 (Router/Traefik)
├── helium.mrzk.io   → 10.0.10.20 (Container host + RTX 5000s)
├── lithium.mrzk.io  → 10.0.10.30 (M3 Ultra inference)
└── traefik.mrzk.io  → External HTTPS proxy

Storage Architecture#

graph LR
    subgraph "Real-time Sync (Syncthing)"
        H[hydrogen.mrzk.io<br/>2TB NVMe]
        HE[helium.mrzk.io<br/>2TB NVMe]
        L[lithium.mrzk.io<br/>1TB NVMe]
        
        H <-->|/code| HE
        H <-->|/configs| L
        HE <-->|/models| L
    end

    subgraph "Cold Backup"
        TB[Thunderbolt NVMe<br/>Weekly Images]
        HE -.->|dd backup| TB
    end

    subgraph "Future: MergerFS + SnapRAID"
        P1[Drive 1]
        P2[Drive 2]
        P3[Drive 3]
        PARITY[Parity Drive]
        
        P1 -.-> POOL[(MergerFS Pool)]
        P2 -.-> POOL
        P3 -.-> POOL
        POOL -.-> PARITY
    end

    style H fill:#e1f5ff
    style HE fill:#fff4e1
    style L fill:#f0f0f0
    style TB fill:#ffebee
    style POOL fill:#e8f5e9

Data Flow#

  1. Active Development: /code syncs across all 3 servers via Syncthing
  2. Model Storage: /models on helium/lithium for local inference
  3. Config Backup: /configs real-time + weekly full disk images
  4. Future Scaling: MergerFS pool + SnapRAID parity when 3+ drives available

Inference Stack#

graph TB
    subgraph "Client Layer"
        DISCORD[Discord Bot]
        WEB[Web Interface]
        API[API Clients]
    end

    subgraph "Load Balancer"
        TRAEFIK[Traefik Reverse Proxy<br/>traefik.mrzk.io:443<br/>Hydrogen]
    end

    subgraph "Inference Layer"
        LLAMA1[llama.cpp Server<br/>lithium.mrzk.io:8080<br/>Qwen3.5-122B A10B]
        LLAMA2[llama.cpp Server<br/>helium.mrzk.io:8080<br/>Gemma4-26B Q4_0]
    end

    subgraph "GPU Resources"
        GPU1[Quadro RTX 5000 #1<br/>16GB VRAM Turing]
        GPU2[Quadro RTX 5000 #2<br/>16GB VRAM Turing]
        GPU3[M3 Ultra<br/>96GB Unified Memory]
    end

    DISCORD --> TRAEFIK
    WEB --> TRAEFIK
    API --> TRAEFIK
    
    TRAEFIK --> LLAMA1
    TRAEFIK --> LLAMA2

    LLAMA1 --- GPU1
    LLAMA1 --- GPU2
    LLAMA2 --- GPU3

    style TRAEFIK fill:#fff3e0
    style LLAMA1 fill:#e8f5e9
    style LLAMA2 fill:#e8f5e9
    style GPU1 fill:#fce4ec
    style GPU2 fill:#fce4ec
    style GPU3 fill:#f3e5f5

Model Registry#

ModelQuantizationSizePurposeLocation
Qwen3.5-122BA10B80GBMain inferencelithium (M3 Ultra)
Gemma4-26B-A4BQ4_019GBFallbackhelium (RTX 5000s)
nomic-embed-text-v1.5-274MBEmbeddingsAll servers

Performance Tuning#

# High VRAM (16GB per GPU - helium)
n_gpu_layers: 50-60
n_ctx: 8192
n_batch: 512
flash_attn: true

# Unified Memory (96GB - lithium)
n_gpu_layers: 999
n_ctx: 16384
n_batch: 1024

Container Deployment Architecture#

graph TB
    subgraph "helium Docker Host"
        subgraph "Network Stack"
            TRAEFIK[Traefik<br/>80/443]
            PIHOLE[Pi-hole<br/>53]
        end

        subgraph "Authentication"
            AUTH[authentik<br/>9000/9443]
            REDIS[Redis]
            POSTGRES[PostgreSQL]
            AUTH --> REDIS
            AUTH --> POSTGRES
        end
        TRAEFIK --> DRONE[Drone CI]
        TRAEFIK --> PORTAINER[Portainer]
        TRAEFIK --> GRAFANA
        TRAEFIK --> UNIFI
    end

    INTERNET[External] --> TRAEFIK
    LOCAL[Local Network] --> TRAEFIK
    LOCAL --> PIHOLE

    style TRAEFIK fill:#fff3e0
    style DRONE fill:#f3e5f5
    style PORTAINER fill:#e8f5e9

    style PROM fill:#fff8e1
    style UNIFI fill:#e0f2f1

Service Discovery#

ServiceInternal URLExternal URLPort
Drone CIhttp://drone.mrzk.iohttps://drone.mrzk.io8000
Portainerhttp://portainer.mrzk.iohttps://portainer.mrzk.io9000
authentikhttp://authentik.mrzk.iohttps://authentik.mrzk.io9000/9443
GitHubhttp://github.mrzk.iohttps://github.mrzk.io8081
Grafanahttp://grafana.mrzk.iohttps://grafana.mrzk.io8081
UniFihttps://unifi.mrzk.io:8443https://unifi.mrzk.io:84438443

Security Model#

graph TB
    INTERNET[Internet]
    CLOUDflare[Cloudflare]
    HYDROGEN[hydrogen.mrzk.io<br/>Intel NUC<br/>Router/Traefik/DNS]
    
    subgraph "DMZ"
        TRAEFIK[Traefik<br/>Public TLS]
    end

    subgraph "Internal VLANs"
        MGMT[Management VLAN 10]
        WORK[Workstation VLAN 20]
        IoT[IoT VLAN 30]
    end

    subgraph "Security Layers"
        WAF[WAF Rules]
        AUTH[authentik SSO]
        FW[Firewall]
    end

    INTERNET --> CLOUDflare
    CLOUDflare --> TRAEFIK
    TRAEFIK --> HYDROGEN
    HYDROGEN --> MGMT
    
    WORK -->|22,80,443| MGMT
    IoT -->|80,443| INTERNET
    
    MGMT --> AUTH
    AUTH --> WAF
    WAF --> FW

    style INTERNET fill:#ffebee
    style TRAEFIK fill:#fff3e0
    style MGMT fill:#e8f5e9
    style AUTH fill:#e3f2fd
    style FW fill:#fce4ec

Access Control#

  1. External: Cloudflare → Traefik (TLS termination on hydrogen)
  2. Authentication: authentik SSO for all internal services
  3. Network: UniFi firewall denies inter-VLAN by default
  4. IoT: Outbound-only (no inbound connections)
  5. Management: SSH access from workstations VLAN only

Backup & Recovery Strategy#

graph LR
    subgraph "Real-time Protection"
        SYNCTHING[Syncthing<br/>Peer-to-peer sync]
    end

    subgraph "Scheduled Backups"
        SNAPRAID[SnapRAID<br/>Nightly sync]
        COLD[Thunderbolt<br/>Weekly images]
        CLOUD[Backblaze B2<br/>Continuous]
    end

    subgraph "Recovery Points"
        RP1[Last Sync]
        RP2[Last Parity]
        RP3[4-week Rotation]
        RP4[Unlimited]
    end

    SYNCTHING --> RP1
    SNAPRAID --> RP2
    COLD --> RP3
    CLOUD --> RP4

    style SYNCTHING fill:#e8f5e9
    style SNAPRAID fill:#fff3e0
    style COLD fill:#ffebee
    style CLOUD fill:#e3f2fd

RTO/RPO Targets#

Data TypeRecovery TimeRecovery PointMethod
Code/Configs< 1 hour< 5 minSyncthing
Container Data< 4 hours< 24 hoursSnapRAID + Docker volumes
Full System< 24 hours< 7 daysThunderbolt images
Critical Data< 1 hour< 1 hourBackblaze B2

Deployment Workflow#

flowchart TD
    DEV[Local Development] -->|git push| GitHub[GitHub Repository]
    GitHub -->|webhook| CI[CI/CD Pipeline]
    
    subgraph "Deployment Targets"
        HELIUM[Deploy to helium]
        LITHIUM[Deploy to lithium]
    end

    CI --> HELIUM
    CI --> LITHIUM

    HELIUM --> TRAEFIK[Traefik Auto-Discovery]
    LITHIUM --> TRAEFIK

    TRAEFIK --> MONITOR[Prometheus/Grafana]
    MONITOR -->|alerts| DISCORD[Discord Notifications]

    style GitHub fill:#f3e5f5
    style CI fill:#fff3e0
    style HELIUM fill:#e8f5e9
    style LITHIUM fill:#e8f5e9
    style TRAEFIK fill:#fff8e1

Monitoring Stack#

  • Prometheus: Metrics collection (token generation, GPU usage, request rates)
  • Grafana: Visualization dashboards
  • Discord Integration: Alert notifications
  • LLM Server Metrics: /metrics endpoint on port 8081

Future Roadmap#

  1. Storage: Deploy MergerFS + SnapRAID when 3+ drives available
  2. Kubernetes: Migrate from Docker Compose to K3s for orchestration
  3. GPU Pooling: Implement GPU load balancing across helium and lithium
  4. Observability: Add distributed tracing with Jaeger
  5. Disaster Recovery: Automated cross-site replication

DevOps & Infrastructure Stack#

  • code repo: GitHub (helium) - Git hosting with built-in CI/CD, 4GB RAM reserved, ports 8081/8444/2221
  • ci/cd: GitHub CI/CD - Native pipelines, auto-deployment via webhooks, shared runners on helium
  • auth: Authentik (helium) - SSO provider with Postgres+Redis backend, ports 9000/9443
  • monitoring: Prometheus + Grafana (helium) - Metrics collection and visualization, ports 9090/3001
  • network: UniFi Controller (helium) - Network management and VLAN configuration, port 8443
  • storage: MinIO (helium) - S3-compatible object storage for backups and media, ports 9000/9001
  • paaS: Coolify (helium, optional) - App deployment automation, port 8000
  • reverse proxy: Traefik (hydrogen) - HTTPS termination, auto-service discovery, ports 80/443
  • dns: Pi-hole (hydrogen) - Network-wide ad blocking and custom DNS, port 53
  • security: Fail2Ban (hydrogen) - Intrusion detection and IP banning
  • inference: llama.cpp (helium + lithium) - GPU inference on RTX 5000s, CPU fallback on M3 Ultra, port 8080