Edge AI Inference for Robotics — On-Device Models

Q: What is edge AI inference?

Edge AI inference runs machine-learning models directly on a device — a robot, camera or controller — rather than in the cloud. It delivers low latency, works without connectivity, and keeps data local, which is essential for real-time perception and control in physical systems.

Q: What hardware is used for edge AI inference in robotics?

In 2026, NVIDIA Jetson AGX Thor and IGX Thor for advanced and safety-critical robots, with Orin Nano, AGX Orin and IGX Orin for lighter workloads, chosen by compute, power and safety requirements. Haink configures and supplies these alongside cameras, sensors and controllers.

Q: Should I choose Jetson Orin or Thor for edge inference?

Choose Orin for prototypes, vision AI and most production robots where its compute is enough; choose Thor when running a large vision-language-action model, a reasoning VLM, or heavy multi-sensor fusion that needs its FP4/FP8 compute and 128 GB of memory.

Q: Can you run vision-language-action (VLA) models on the edge?

Yes. Haink deploys compact VLA and reasoning vision-language models on-device, typically on Jetson AGX Thor, with optimization (quantization, pruning, TensorRT) to meet the robot's latency and power budget.

Q: Can you optimize a model to run on the edge?

Yes. Haink's in-house AI team adapts and optimizes models for on-device inference — quantization, pruning and runtime tuning — so they meet the latency and power budget of the target edge platform.

Edge AI inference runs models directly on the device — a robot, camera or controller — instead of sending data to the cloud. For physical systems that means real-time perception and control within a single-digit-millisecond budget, operation without connectivity, and data that never leaves the machine. Haink delivers both halves: the optimized model and the edge hardware it runs on.

What we build

On-device perception

Object detection, segmentation, pose estimation and scene understanding running locally on the robot — tuned to your environment and camera setup.

On-device language, VLA & reasoning VLMs

Compact language, vision-language-action (VLA) and reasoning vision-language models (such as NVIDIA Cosmos Reason) for on-board reasoning, instruction following and human interaction without a cloud round-trip.

Real-time sensor fusion

Fusing multiple cameras, depth, LiDAR and radar on-device with synchronized, low-latency perception — the input layer modern VLA models depend on.

Model optimization

Quantization, pruning and runtime tuning (TensorRT and similar) so models fit the latency, memory and power budget of the target edge platform.

Deployment & updates

Packaging, on-device runtime and over-the-air model updates with monitoring — so inference stays reliable after it ships.

Where we deploy edge inference

Visual quality inspection

Detect defects on the production line in real time, on-device, with no cloud round-trip and no line data leaving the floor.

Pick-and-place & assembly

Vision-guided manipulation of varied or unfamiliar parts, where a fixed program would fail and a learned policy adapts.

Autonomous mobile robots

On-board navigation, obstacle avoidance and perception for AMRs and AGVs that must keep running when connectivity drops.

Instruction-following assistants

Vision-language-action models that turn a spoken or written instruction into action, on-device and within the control loop.

Edge hardware we supply

Chosen by compute, power and safety needs — configured and supplied alongside sensors and controllers.

Platform	Typical use
NVIDIA Jetson Orin Nano	Entry prototypes, lightweight perception, low power (~67 TOPS, 8 GB)
NVIDIA Jetson AGX Orin	Production robots, multi-camera perception + control (~275 TOPS, up to 64 GB)
NVIDIA Jetson AGX Thor	2026 flagship — humanoids and on-device VLA / reasoning VLMs (~2,070 FP4 TFLOPS, 128 GB)
NVIDIA IGX Thor / IGX Orin	Safety- and reliability-critical industrial / medical edge (ISO 26262, IEC 61508)

Not sure which to pick? See Jetson Orin vs Thor → · Browse robotics & physical-AI hardware →

How an engagement works

1 · Scope & target

We define the task, environment and the hard constraints — latency, power budget and safety — and pick the target edge platform.

2 · Prototype the loop

Perception-to-action working on the real edge hardware, so the budget is proven on the target, not on a workstation.

3 · Optimize

Quantization, pruning and runtime tuning (TensorRT) until the model meets the latency, memory and power budget.

4 · Deploy & support

Packaging, on-device runtime, over-the-air updates and monitoring so inference stays reliable after it ships.

What a deployed edge cell looks like

Illustrative target profile for a single-camera visual-inspection cell on Jetson AGX Orin — representative figures, not a specific client result.

Metric	Target
Inference latency	~8 ms per frame
Throughput	30+ FPS per camera
Defect recall (target classes)	≥ 99%
Cloud dependency	None — fully on-device, offline-capable
Power envelope	< 40 W

We agree these targets up front and prove them on the target hardware during the prototype stage.

Frequently asked questions

What is edge AI inference?

Edge AI inference runs machine-learning models directly on a device — a robot, camera or controller — rather than in the cloud. It delivers low latency, works without connectivity, and keeps data local, which is essential for real-time perception and control in physical systems.

What hardware is used for edge AI inference in robotics?

In 2026, NVIDIA Jetson AGX Thor and IGX Thor for advanced and safety-critical robots, with Orin Nano, AGX Orin and IGX Orin for lighter workloads — chosen by compute, power and safety requirements. Haink configures and supplies these alongside cameras, sensors and controllers.

Should I choose Jetson Orin or Thor for edge inference?

Choose Orin for prototypes, vision AI and most production robots where its compute is enough; choose Thor when running a large vision-language-action model, a reasoning VLM, or heavy multi-sensor fusion that needs its FP4/FP8 compute and 128 GB. See our Jetson Orin vs Thor guide.

Can you run vision-language-action (VLA) models on the edge?

Yes. We deploy compact VLA and reasoning vision-language models on-device, typically on Jetson AGX Thor, with optimization (quantization, pruning, TensorRT) to meet the robot's latency and power budget.

Can you optimize a model to run on the edge?

Yes — our in-house AI team adapts and optimizes models for on-device inference (quantization, pruning, runtime tuning) so they meet the latency and power budget of the target edge platform.

From the knowledge base

All physical-AI guides → Robotics integration → All physical-AI guides → Edge AI for robotics → What are VLA models → All physical-AI guides →

Building something at the edge?

Tell us the task and the constraints — we’ll propose a model approach and an edge-hardware design.

Just scoping the build? See the robotics reference architectures — blueprints with a bill of materials and indicative pricing →

Edge AI inference for robotics and on-device models