(myst-sphinx)=

# 💡 FAQ

**Which models work best?**
   - Decoder-only transformer families in GGUF with available kernels generally perform well.

**Do I need a high-end GPU?**
   - Not necessarily. The hybrid layout reduces VRAM pressure by keeping the long tail on the CPU, making consumer GPUs practical.

**How is this different from pure-GPU engines?**
   - OptiML co-designs placement and scheduling around activation locality, trading a modest amount of CPU work for the ability to serve larger models efficiently on a PC.

**Does OptiML support Mistral, original Llama, GPT...?**
   - OptiML is designed to be easily integrated into any model that uses the transformer architecture, so these models can be supported. However, this repository only provides the solution for Llama 2 and Llama 3 at the moment. More models will follow in the future.

**What if...**
   - Issues are welcome! Feel free to open an issue and attach your running environment and parameters. We will try our best to help you.
