AI Training Infrastructure — HGX, Fabric & Storage

Training infrastructure with honest lead times

HGX-class systems on realistically-quoted allocation, H200 NVL nodes from stock for fine-tuning, plus the fabric, storage and cooling that training actually requires.

Training hardware is allocation-constrained everywhere — anyone promising 8-GPU B300 systems 'from stock' deserves skepticism. We quote real lead times, and bridge the wait with stocked H200 NVL fine-tuning capacity so your team ships while the big iron is inbound.

What a training stack needs

Component	Spec	Availability
HGX B200 / B300 systems	8× GPU, NVLink, 6U+	On allocation — quoted honestly
Fine-tuning nodes	2–4× H200 NVL 141 GB per node	Often from stock
InfiniBand fabric	NDR switches, ConnectX-7/8 adapters	Short lead times
Dataset storage	PowerScale / AFF — sustained multi-GB/s reads	Sized to GPU count
Checkpoint flash	NVMe tiers for fast save/restore	Drives typically from stock
Liquid cooling	DLC manifolds, CDUs for 60kW+ racks	Project quote

How we de-risk training projects

LoRA firstfine-tune on stocked H200 NVL while HGX is on allocation

Multi-GB/sstorage throughput sized per GPU

60 kW+rack densities planned with DLC

Stageddelivery in deployment order

Stock rotates daily — positions are "typically available" and confirmed per request, usually within one business day. Stock guides →

Export compliance. NVIDIA H200/H100/B-series GPUs are US export-controlled dual-use items (ECCN 3A090). Haink supplies them only after end-user and destination screening under US EAR and OFAC rules, and declines any order to a restricted destination or end use. Hong Kong and Mainland China destinations are treated as controlled under current US rules; orders are quoted accordingly.

Frequently asked questions

What lead time should we expect for B300 systems?

Allocation-dependent — typically quoted in months, not weeks, and we say so upfront. We will not promise stock that does not exist; we will bridge with H200 NVL fine-tuning nodes that do.

Can we start training before the HGX systems arrive?

Yes — that is the standard play: LoRA/QLoRA fine-tuning on stocked H200 NVL nodes now, full pre-training when allocation lands. Same fabric and storage serve both phases.

How much storage does training need?

Rule of thumb: sustained read throughput of ~1 GB/s per high-end GPU for data loading, plus fast NVMe for checkpoints. We size from your dataset and batch profile, not generic charts.

Do you supply liquid cooling?

Yes — DLC readiness (manifolds, CDUs) and rear-door options, with the air-vs-liquid decision tree documented in our cooling guide.

GPU cluster deployment → Liquid cooling for AI servers → H100 vs H200 vs B200 → Private AI infrastructure →

Running AI on this infrastructure? Haink also builds the LLM & ML software that runs on it — model, pipeline and GPUs under one contract.

Planning a training build?

Pricing, availability and delivered lead time within one business day.