(intro/get-started)=
# 🚀 Quickstart

## Build from source

```bash
git clone https://github.com/NU-QRG/optiml.git
cd optiml
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
  -DOPTIML_CUBLAS=ON \        # CUDA (NVIDIA)
  -DOPTIML_METAL=OFF \        # Apple Silicon (toggle as needed)
  -DOPTIML_OPENCL=OFF         # Other GPU backends (toggle as needed)
cmake --build . -j
```

> **Tip:** Toggle the back-ends that match your machine (e.g., set `OPTIML_METAL=ON` on Apple Silicon).

## (Optional) Python bindings

```bash
cd bindings/python
pip install -e .
```

## Prepare a model

OptiML works well with standard GGUF models. If you have original weights, first convert to GGUF, then optionally quantize:

```bash
# Example: quantize a GGUF model to Q4_K
./build/optiml-quantize --input <model path> --output model-q4_k.gguf --type q4_k
```

## Run text generation (CLI)

```bash
./build/optiml-cli --model model-q4_k.gguf --prompt "Explain activation locality in one paragraph." --n-predict 128
```

## Start the HTTP demo server

```bash
./examples/server/optiml-server --model model-q4_k.gguf --host 127.0.0.1 --port 8080
```

Open the provided minimal web UI and chat locally. The server exposes a simple REST API you can call from any client.