🧩 How OptiML Works

🧩 How OptiML Works#

  1. Measure activation locality – Identify neurons that are consistently active across inputs.

  2. Partition neurons – Tag a small β€œhot” set and a large β€œcold” set per layer.

  3. Place & cache – Pin hot neurons and related weights on the GPU; compute cold activations on the CPU.

  4. Hybrid scheduling – Overlap CPU/GPU compute and data movement; apply quantization to reduce memory and improve throughput.