π OptiML Overview#
OptiML: Drop-in inference engine for AI agents.
OptiML, a high-speed Large Language Model (LLM) inference engine on a personal computer (PC) equipped with a single consumer-grade GPU. The key underlying the design of OptiML is exploiting the high locality inherent in LLM inference, characterized by a power-law distribution in neuron activation.
Hot neuron run on GPU, cold neurons on CPU. Breaks the shackle of VRAM size.
Learn your modelβs pattern dynamically, adaptively. Ensures maximum compatibility.
Only calculate whatβs needed. No wasted work on useless data.
Can be applied to all transformer-based models. Accelerates your model effortlessly.
Supports major programming languages. Front-end libraries available for quick access.
Teamed up with leading tech companies. Continuously adopts bleeding-edge technology.
Acknowledgements
The OptiML project was initiated at QRG lab, Northwestern University. In the projectβs early stage, we received contributions from top minds at leading institutions around the world. Special thanks to them who made this project possible!
|
|
|||
|---|---|---|---|
![]() Gaurav Juvekar |
![]() Rajesh Gandham |
![]() Akif Corduk |
![]() Ihar Hrachyshka |
|
|
|||
![]() Ashwin Bharambe |
![]() Dalton Flanagan |
||
|
|
|||
![]() Sebastien Han |
![]() Charlie Doern |
||
|
|
|||
![]() David Bernal |
![]() Jihun Hwang |
||
|
|
|||
![]() Yutong Huang |
|||
|
|
|||
Udit Gupta |
Sirena Yu |
Tomas Janda |
Martino Mensio |
Emanuel Gerber |
Nour Taqatqa |
Sung Min Cho |
Dennis Wu |
Henry Buron |
Glen Koundry |
Jimmy Kuhlman |
Devanshu Desai |
Brian Lee |
Marko Sterbentz |
Sam Leeman |
Cameron Barrie |
k8sify |
|||



























