# 🧩 How OptiML Works

1. **Measure activation locality** – Identify neurons that are consistently active across inputs.
2. **Partition neurons** – Tag a small "hot" set and a large "cold" set per layer.
3. **Place & cache** – Pin hot neurons and related weights on the GPU; compute cold activations on the CPU.
4. **Hybrid scheduling** – Overlap CPU/GPU compute and data movement; apply quantization to reduce memory and improve throughput.