π Quickstart#
1. Build from source#
git clone https://github.com/NU-QRG/optiml.git
cd optiml
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DOPTIML_CUBLAS=ON \ # CUDA (NVIDIA)
-DOPTIML_METAL=OFF \ # Apple Silicon (toggle as needed)
-DOPTIML_OPENCL=OFF # Other GPU backends (toggle as needed)
cmake --build . -j
Tip: Toggle the back-ends that match your machine (e.g., set
OPTIML_METAL=ONon Apple Silicon).
2. (Optional) Python bindings#
cd bindings/python
pip install -e .
3. Prepare a model#
OptiML works well with standard GGUF models. If you have original weights, first convert to GGUF, then optionally quantize:
# Example: quantize a GGUF model to Q4_K
./build/optiml-quantize --input <model path> --output model-q4_k.gguf --type q4_k
4. Run text generation (CLI)#
./build/optiml-cli --model model-q4_k.gguf --prompt "Explain activation locality in one paragraph." --n-predict 128
5. Start the HTTP demo server#
./examples/server/optiml-server --model model-q4_k.gguf --host 127.0.0.1 --port 8080
Open the provided minimal web UI and chat locally. The server exposes a simple REST API you can call from any client.