π§© How OptiML Works#
Measure activation locality β Identify neurons that are consistently active across inputs.
Partition neurons β Tag a small βhotβ set and a large βcoldβ set per layer.
Place & cache β Pin hot neurons and related weights on the GPU; compute cold activations on the CPU.
Hybrid scheduling β Overlap CPU/GPU compute and data movement; apply quantization to reduce memory and improve throughput.