Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs
Link: baseten.co/blog/sota-performan…
Discussion: news.ycombinator.com/item?id=4…
How we run GPT OSS 120B at 500+ tokens per second on NVIDIA GPUs
How we optimized GPT OSS 120B for state-of-the-art latency and throughput on launch day.Amir Haghighat (Baseten)
Chaotic Unicorn
in reply to Niavy • • •Un Bourguignon
in reply to Niavy • • •