One of the quickest ways to start playing with a good local LLM on macOS (if you have ~12GB of free disk space and RAM) - using llama-server and gpt-oss-20b:
brew install llama.cpp
llama-server -hf ggml-org/gpt-oss-20b-GGUF \
--ctx-size 0 --jinja -ub 2048 -b 2048 -ngl 99 -fa
simonwillison.net/2025/Aug/19/…
llama.cpp guide: running gpt-oss with llama.cpp
Really useful official guide to running the OpenAI gpt-oss models using llama-server from llama.cpp - which provides an OpenAI-compatible localhost API and a neat web interface for interacting with the …Simon Willison’s Weblog
verita84
in reply to Simon Willison • • •