# ⚡ OptiML Demo OptiML accelerates local inference by exploiting **activation locality**: a compact set of "hot" neurons fire frequently across inputs, while the long tail of "cold" neurons is input-dependent. OptiML places the hot subset on the GPU and schedules the cold subset on the CPU, delivering strong throughput with low VRAM on everyday hardware. **Both llama.cpp (left) vs. OptiML (right) were running on the same hardware and fully utilized VRAM on a single RTX 5080.**