llama.cpp is often the go-to solution for running LLMs on edge devices, leveraging handwritten kernels for optimized execution. But while this approach delivers speed, it lacks flexibility when supporting new models and quantization techniques.
For instance, highly relevant models like Apple's OpenELM took months to be integrated into llama.cpp. Similarly, the GitHub issue to fix DeepSeek AI's multimodal janus-pro-1b has been open for nearly two months — highlighting the challenges of adapting to emerging architectures.
That’s why roofline is building an MLIR-based edge AI compiler with a model-agnostic LLM pipeline. Our approach enables day-0 deployment for DeepSeek’s janus-pro-1b and ensures rapid support for the latest quantization methods. Below, we showcase how our solution delivers ~8× performance gains over the native PyTorch compiler (TorchInductor) — and also why orange might just be the perfect company color!
Curious about how generic and scalable our approach is? Let’s chat!And stay tuned — janus-pro-1b also supports image-to-text and text-to-image capabilities.