Home / Reference Architectures

Reference architectures for private AI

Validated hardware blueprints you can budget from — inference pod, fine-tuning cluster and starter stack. Each with a bill of materials, GPU options and indicative pricing. We tailor to your workload and screen every order for compliance.

Private AI Inference Pod

4 nodes8× H200 NVL400G fabricServes 70–180B models

The workhorse for serving large models privately: four PCIe GPU nodes on a 400G fabric with shared flash storage. PCIe H200 NVL avoids HGX allocation queues, so this deploys in weeks. Ideal for production inference and high-concurrency serving.

GPU node 12× H200 NVLGPU node 22× H200 NVLGPU node 32× H200 NVLGPU node 42× H200 NVL400G fabricQSFP-DD / IBAI storageNetApp / PowerScaleHead / serviceR660 class
ComponentSpecificationNotes
GPU nodes4× server, 2× H200 NVL 141 GB eachDell R760xa / HPE DL380a class
Fabric400G QSFP-DD or InfiniBandcut-length schedule included
StorageNetApp AFF / Dell PowerScalesized to model + cache
Head/serviceR660 / DL360 classscheduling, login, monitoring
Power/coolingintelligent PDUs, rear-door HXper-rack budget calculated
from ~$350,000  all-in, ex-works hub; GPU pricing moves with allocation

Fine-Tuning & Training Cluster

8 nodesHGX-classInfiniBand NDRDLC-ready

For fine-tuning and pre-training: eight HGX 8-GPU systems on a non-blocking InfiniBand NDR spine, with high-throughput flash for datasets and checkpoints, and direct liquid cooling for 60kW+ racks. HGX is allocation-bound — we quote honest lead times and can bridge with H200 NVL nodes meanwhile.

InfiniBand NDR spinenon-blocking fabricNode 1HGX 8-GPUNode 2HGX 8-GPUNode 3HGX 8-GPUNode 4HGX 8-GPUNode 5HGX 8-GPUNode 6HGX 8-GPUNode 7HGX 8-GPUNode 8HGX 8-GPUFlash storagecheckpoints + datasetsDLC coolingCDU + manifolds
ComponentSpecificationNotes
Compute8× HGX 8-GPU systems (B200/B300)on allocation, quoted realistically
FabricInfiniBand NDR, ConnectX-7/8non-blocking topology
Storageparallel flash, multi-GB/sdatasets + fast checkpoints
CoolingDLC manifolds + CDUs60kW+ rack density
Powerhigh-density PDUs, UPSN+1 options
from ~$700,000  scales with node count and GPU allocation

Private AI Starter Stack

Desk to rackDGX Spark + 1 nodeRAG-ready

The smallest credible private-AI footprint: a DGX Spark for development, one H200 NVL inference node for launch, and an NVMe-heavy node for RAG and vector search. Same software stack scales to the pod and cluster above when you grow.

DGX Sparkdev / prototypingInference node2× H200 NVLRAG / vectorNVMe CPU node
ComponentSpecificationNotes
DevelopmentNVIDIA DGX Spark128 GB unified, desktop
Inference1× node, 2× H200 NVL70B-class production serving
RAG / vectorNVMe CPU nodeembeddings, retrieval, cache
Networking25/100G switchingCatalyst / Nexus
from ~$95,000  ex-works hub; DGX Spark from ~$4,000 standalone

Need this sized to your models?

Send your workloads and destination — we return a tailored BOM, firm pricing and compliant lead times within one business day.

sales@haink.org