A Series B AI company was spending $180K/month on cloud GPU instances. Haink supplied a configured NVIDIA H200 PCIe node cluster that broke even in under 14 months — and outperformed the cloud baseline by 3×.
The team ran LLM inference on cloud GPU instances. At Series B scale — 40M+ daily API calls — the monthly GPU bill hit $180K and was growing. Reserved instances required 12-month commitments with no hardware flexibility. The CTO wanted to understand the own-vs-rent math before the next funding round.
Haink proposed a 2U server with 8× NVIDIA H200 PCIe 96 GB GPUs, dual Xeon Platinum CPUs, 1.5 TB DDR5 ECC RAM, 4× 7.68 TB NVMe, and dual 100 GbE NICs. We handled pre-racking, firmware baseline, and burn-in testing before shipping. The client racked, cabled, and had the first model loaded in an afternoon.
Every step from first contact to first inference token.
| Component | Specification | Qty |
|---|---|---|
| GPU | NVIDIA H200 PCIe 96 GB HBM3e | 8 |
| Host server | 2U, dual Intel Xeon Platinum 8592+, 60-core | 1 |
| System RAM | DDR5-5600 ECC RDIMM, 1.5 TB (24× 64 GB) | 1 set |
| NVMe storage | 7.68 TB PCIe Gen5 NVMe (model weights cache) | 4 |
| Networking | Dual-port 100 GbE QSFP28 NIC | 1 |
| Power | Dual 3 kW redundant PSU, 80+ Titanium | 2 |
| Warranty | 12-month return-to-base, parts and labour | — |
H200 PCIe GPUs sourced from Hong Kong stock. Serial numbers verified and provided prior to payment.
Related resources
Send your monthly GPU spend and model sizes — Haink will model the own-vs-rent break-even for your workload within one business day.