How to set up and run OpenAI’s ‘gpt-oss-20b’ open weight model locally on your Mac

18 hours ago 2

This week, OpenAI released its long-awaited open weight model called gpt-oss. Part of the appeal of gpt-oss is that you can run it locally on your own hardware, including Macs with Apple silicon. Here’s how to get started and what to expect.

Models and Macs

First, gpt-oss comes in two flavors: gpt-oss-20b and gpt-oss-120b. The former is described as a medium open weight model, while the latter is considered a heavy open weight model.

The medium model is what Apple silicon Macs with enough resources can expect to run locally. The difference? Expect the smaller model to hallucinate more compared to the much larger model due to the data set size difference. That’s the tradeoff for an otherwise faster model that’s actually capable of running on high end Macs.

Still, the smaller model is a neat tool that’s freely available if you have a Mac with enough resources and a curiosity about running large language models locally.

You should also be aware of differences with running a local model compared to, say, ChatGPT. By default, the open weight local model lacks a lot of the modern chatbot features that make ChatGPT useful. For example, responses do not contain consideration for web results that can often limit hallucinations.

OpenAI recommends at least 16GB RAM to run gpt-oss-20b, but Macs with 32GB RAM or more will obviously perform better. Based on early user feedback, 16GB RAM is really the floor for what’s needed to just experiment. (AI is a big reason that Apple stopped selling Macs with 8GB RAM not that long ago.)

Setup and use

Preamble aside, actually getting started is super simple.

First, install Ollama on your Mac. This is basically the window for interfacing with gpt-oss-20b. You can find the app at ollama.com/download, or download the Mac version from this download link.

Next, open Terminal on your Mac and enter this command:

ollama pull gpt-oss:20b
ollama run gpt-oss:20b

This will prompt your Mac to download gpt-oss-20b, which uses around 15GB of disk storage.

Finally, you can launch Ollama and select gpt-oss-20b as your model. You can even put Ollama in airplane mode in the app’s settings panel to ensure everything is happening locally. No sign-in required.

To test gpt-oss-20b, just enter a prompt into the text field and watch the model get to work. Again, hardware resources dictate model performance here. Ollama will use every resource it can when running the model, so your Mac may slow to a crawl while the model is thinking.

My best Mac is a 15-inch M4 MacBook Air with 16GB RAM. While the model functions, it’s a tall order even for experimentation on my machine. Responding to ‘hello’ took a little more than five minutes. Responding to ‘who was the 13th president’ took a little longer at around 43 minutes. You really do want more RAM if you plan to spend more than a few minutes experimenting.

Decide you want to remove the local model and reclaim that disk space? Enter this terminal command:

ollama rm gpt-oss:20b

For more information on using Ollama with gpt-oss-20b on your Mac, check out this official resource. Alternatively, you could use LM Studio, another Mac app for working with AI models.

FTC: We use income earning auto affiliate links. More.