Runtime

A lightweight runtime that executes the compiled model on the device. It brings all features for efficient SoC orchestration.

Connect with us

in action

See the runtime in action and experience how it unlocks the full SoC for cutting-edge applications.

Key features

Heterogenous execution

Parallel processing of models on the most suitable devices across CPUs, GPUs and NPUs.

Dynamic shapes

Full support of dynamic shapes at execution time for LLMs and other cutting-edge models.

Extendable for custom NPUs

Easily extendable for novel NPUs through HALs.

OS and bare-metal ready

One integrated toolchain that fits your needs, from Linux, MacOS and Windows to bare-metal systems.

how it works

Easily generate the runtime executable ahead-of-time to port it to your target devices. Our lightweight C-based engine integrates seamlessly into your workflow, allowing you to execute compiled models on bare-metal or OS-based systems with minimal overhead.

view docs

import torch, create_attic_runtime

random_input = torch.rand(1, 3, 256,
256, dtype=torch.float32)
rt = create_attic_runtime(compiled_model)
main_function = rt.get_function_by_name("main")
iree_output = main_function(random_input)

Licensing Options

Non-commercial License (free)

Unlimited developer access within organization
Coverage for selected models
Guaranteed stability and documentation

Contact Sales

Commercial License

Everything under Non-Commercial License
Commercial deployments to customers
Guaranteed coverage for TFLite, PyTorch and ONNX models
Continuous performance optimizations in Roofline’s stack
Premium support

Contact Sales

FAQ

What does Roofline’s runtime do?

The runtime is a small piece of software that is ported to the device to run the compiled model. It is a lightweight, fast inference engine built in C to execute the optimized kernels generated in the SDK. Our runtime supports heterogeneous execution across CPUs, GPUs, and NPUs, handles dynamic shapes for models like LLMs, and can be extended to custom NPUs through hardware abstraction layers. The runtime runs on Linux, macOS, Windows, and even bare-metal systems.

Which hardware platforms are supported?

Our runtime supports a broad range of SoCs. Thanks to an extensible architecture, new devices and custom NPUs can be added easily, and additional platforms are continuously evaluated. Check out the Main Page for our supported CPUs, GPUs and NPUs.

Can I monitor the performance of deployed models?

Yes. Our performance dashboard gives you clear insights into your models across devices, including latency, throughput, memory usage, stability over time, and multi-device comparisons. Nightly updates keep results up-to-date. For more details, see our Performance Dashboard page.

How can I get access to the runtime?

Just reach out through our Contact Us form. We will get in touch immediately.