# ⚡ OptiML Demo

OptiML accelerates local inference by exploiting **activation locality**: a compact set of "hot" neurons fire frequently across inputs, while the long tail of "cold" neurons is input-dependent. OptiML places the hot subset on the GPU and schedules the cold subset on the CPU, delivering strong throughput with low VRAM on everyday hardware.

**Both llama.cpp (left) vs. OptiML (right) were running on the same hardware and fully utilized VRAM on a single RTX 5080.**
<video src="_static/demo.mp4" autoplay="false" controls="controls" width="800" height="400">
</video>
