Reference Architectures — Private AI Infrastructure

Private AI Inference Pod

4 nodes8× H200 NVL400G fabricServes 70–180B models

The workhorse for serving large models privately: four PCIe GPU nodes on a 400G fabric with shared flash storage. PCIe H200 NVL avoids HGX allocation queues, so this deploys in weeks. Ideal for production inference and high-concurrency serving.

Component	Specification	Notes
GPU nodes	4× server, 2× H200 NVL 141 GB each	Dell R760xa / HPE DL380a class
Fabric	400G QSFP-DD or InfiniBand	cut-length schedule included
Storage	NetApp AFF / Dell PowerScale	sized to model + cache
Head/service	R660 / DL360 class	scheduling, login, monitoring
Power/cooling	intelligent PDUs, rear-door HX	per-rack budget calculated

from ~$350,000 all-in, ex-works hub; GPU pricing moves with allocation

Fine-Tuning & Training Cluster

8 nodesHGX-classInfiniBand NDRDLC-ready

For fine-tuning and pre-training: eight HGX 8-GPU systems on a non-blocking InfiniBand NDR spine, with high-throughput flash for datasets and checkpoints, and direct liquid cooling for 60kW+ racks. HGX is allocation-bound — we quote honest lead times and can bridge with H200 NVL nodes meanwhile.

Component	Specification	Notes
Compute	8× HGX 8-GPU systems (B200/B300)	on allocation, quoted realistically
Fabric	InfiniBand NDR, ConnectX-7/8	non-blocking topology
Storage	parallel flash, multi-GB/s	datasets + fast checkpoints
Cooling	DLC manifolds + CDUs	60kW+ rack density
Power	high-density PDUs, UPS	N+1 options

from ~$700,000 scales with node count and GPU allocation

Private AI Starter Stack

Desk to rackDGX Spark + 1 nodeRAG-ready

The smallest credible private-AI footprint: a DGX Spark for development, one H200 NVL inference node for launch, and an NVMe-heavy node for RAG and vector search. Same software stack scales to the pod and cluster above when you grow.

Component	Specification	Notes
Development	NVIDIA DGX Spark	128 GB unified, desktop
Inference	1× node, 2× H200 NVL	70B-class production serving
RAG / vector	NVMe CPU node	embeddings, retrieval, cache
Networking	25/100G switching	Catalyst / Nexus

from ~$95,000 ex-works hub; DGX Spark from ~$4,000 standalone

Running AI on this infrastructure? Haink also builds the LLM & ML software that runs on it — model, pipeline and GPUs under one contract.

Need this sized to your models?

Send your workloads and destination — we return a tailored BOM, firm pricing and compliant lead times within one business day.