Unlocking edge GPUs for larger AI models and asynchronous execution: Roofline has built a new Vulkan 1.3 HAL for IREE

AI deployment is rapidly moving to edge devices, where dedicated AI accelerators (NPUs) are getting most of the attention for heavy AI workloads. Additionally, GPUs are present on many edge SoCs but are typically being overlooked for AI deployment. The deployment capabilities of current software stacks reflect that since edge GPUs are often lacking full AI support. This case study shows how Roofline unlocks edge GPUs for larger AI models by building a new Vulkan 1.3 HAL for IREE. We demonstrate this by running Qwen3-8B heterogeneously on an ARM Mali-G720 GPU.

Case study

runtime

sdk

Roofline and NXP enable next-generation software support for LLMs on eIQ Neutron NPU

In collaboration with NXP® Semiconductors, we built on Roofline’s scalable MLIR and IREE compiler infrastructure to extend our heterogeneous execution stack to NPUs. Starting with LLM enablement for NXP’s eIQ® Neutron NPU on the i.MX 95 applications processor, this case study showcases three key advantages of the software enablement: 1) Unlocking broad model coverage, 2) overcoming accelerator memory limitations for models above 2GB, and 3) delivering clear performance gains of up to 3.2x in LLM prefill performance over CPU-only execution.

Case study

sdk

runtime

Dynamic shape support: A key enabler for on-device LLM inference

This case study shows how Roofline achieves up to 23× higher throughput by solving one of the most fundamental bottlenecks in efficient on-device language model inference: dynamic shapes in the prefill stage.

Case study

runtime

sdk

Asynchronous Heterogeneous Execution for Edge SoCs

This case study shows how Roofline enabled asynchronous heterogeneous execution on modern edge System-on-Chips (SoCs). Our technology coordinates the SoCs’ CPU–GPU–NPU hardware for running full AI models efficiently. We unlock a long-missing piece in AI deployment software on the edge to run larger models more efficiently on the best suited device available.

Case study

sdk

runtime

Roofline x ARM: Enhancing software support for ARM SVE in MLIR and IREE

This case study showcases how Roofline and ARM enabled scalable, vector-length-agnostic ML execution on Arm CPUs by implementing data-tiled Scalable Vector Extension (SVE) support end-to-end in IREE, unlocking up to 100× speedups on real models and hardware.

Case study

Sorry, we couldn’t find anything matching that. How about browsing our latest cases?

Back to Main Go to cases

Case studies

Unlocking edge GPUs for larger AI models and asynchronous execution: Roofline has built a new Vulkan 1.3 HAL for IREE

Roofline and NXP enable next-generation software support for LLMs on eIQ Neutron NPU

Dynamic shape support: A key enabler for on-device LLM inference

Asynchronous Heterogeneous Execution for Edge SoCs

Roofline x ARM: Enhancing software support for ARM SVE in MLIR and IREE