Haink SolutionsKnowledgeCase StudiesAbout Contact sales

GPU Server Buying Guide 2026 — How to Choose the Right GPU Server for AI

Choosing the right GPU server is a significant capital decision with multi-year implications. The wrong choice — GPU too small for the model, insufficient memory for inference, no liquid cooling in a facility without DLC infrastructure — is expensive to correct. This guide provides a structured decision framework for selecting GPU server hardware based on the actual workload, facility, and budget.

Step 1: Define the Primary Workload

GPU server design priorities differ fundamentally between training and inference. Answer this before comparing hardware:

AI Training (including fine-tuning)

Training requires maximum compute throughput (FLOPS) and GPU-to-GPU communication bandwidth. Prioritize: SXM form factor GPUs (NVLink), multi-GPU servers (8 GPUs per node), and InfiniBand networking for multi-node clusters. The binding constraints are FLOPS and inter-GPU bandwidth, not memory capacity (for most fine-tuning use cases).

AI Inference (production model serving)

Inference requires GPU memory capacity to hold the model and KV cache, and throughput (tokens/second) at the target concurrency. The binding constraint is memory — a model that doesn't fit in GPU memory cannot run. Prioritize memory capacity per GPU and memory bandwidth. PCIe form factor GPUs are acceptable for inference; SXM is not required.

Mixed Training + Inference

If the same hardware will be used for both training and inference at different times (common in enterprise deployments), 8× H100 SXM5 or 8× H200 SXM5 is the most versatile configuration — NVLink and InfiniBand provide training performance; 80–141 GB per GPU provides adequate inference memory for 70B models.

Step 2: Determine Model Size

Model size in GPU memory (FP16 / BF16 precision): 7B model ≈ 14 GB; 13B ≈ 26 GB; 34B ≈ 68 GB; 70B ≈ 140 GB; 180B ≈ 360 GB; 405B ≈ 810 GB. Add 20–40% overhead for KV cache, activation storage, and optimizer states (for training). Quantized inference (INT4/GPTQ) reduces memory by approximately 4×: 70B INT4 ≈ 35 GB.

Model Size FP16 Memory Minimum GPU (inference) Minimum GPU (fine-tune)
7B 14 GB 1× L40S (48 GB) 2× H100 SXM5
34B 68 GB 1× H100 SXM5 (80 GB) 4× H100 SXM5
70B 140 GB 1× H200 (141 GB) or 2× H100 8× H100 SXM5
405B 810 GB 6× H200 or 3× B300 32+ H100 SXM5

Step 3: Choose the GPU Generation

NVIDIA H100 SXM5 — Buy When:

NVIDIA H200 SXM5 — Buy When:

NVIDIA B200 SXM5 — Buy When:

NVIDIA B300 SXM — Buy When:

NVIDIA L40S PCIe — Buy When:

Step 4: Choose the Server Platform

For SXM GPU servers (H100/H200/B200/B300), the server platform (the chassis, baseboard, CPU, memory, and NVLink interconnect) is as important as the GPU. Validated platforms available through Haink:

Step 5: Size the InfiniBand Network

For a single 8-GPU node: no InfiniBand needed between servers. For 2–4 nodes: 1 InfiniBand leaf switch (NVIDIA QM9700, 64-port NDR 400G) handles the full cluster with room to grow. For 8–16 nodes: 2 leaf switches + 1 spine switch in a non-blocking fat-tree. For 32+ nodes: full 3-tier fat-tree design with spine/core layers. If RoCEv2 is chosen instead of InfiniBand: Arista 7800R3 or NVIDIA Spectrum-4 400GbE switches configured for lossless PFC.

Step 6: Assess Facility Requirements

Before finalizing the hardware order, confirm: power capacity per rack (H100 server = 10.2 kW; B200 server = 14–16 kW; 4-server rack = 40–65 kW minimum); cooling type available (air for H100/H200, DLC mandatory for B200/B300 at full load); physical space (GPU servers are deep — 900mm depth typical; verify rack depth); and network infrastructure (overhead cable trays for InfiniBand copper/fiber).

Step 7: Factor Lead Times into Planning

Current GPU server lead times (from Haink, mid-2026): H100 SXM5 systems — 4–10 weeks. H200 SXM5 systems — 6–12 weeks. B200 SXM5 systems — 12–20 weeks. B300 SXM systems — 14–24 weeks. L40S PCIe servers — 3–8 weeks. InfiniBand switches — 4–8 weeks. Order hardware with enough lead time that it arrives before your target deployment date, not on it.

GPU Server Checklist

Summarized buying checklist before finalizing a GPU server order:

Related Resources

Haink
info@haink.org

Winning House
72–76 Wing Lok Street
Sheung Wan, Hong Kong

© 2026 Haink. All rights reserved.  ·  Privacy Policy  ·  TermsHong Kong · Dubai · Singapore · Mainland China · Delaware (USA)