LM Studio Local LLMs 7 min read

LM Studio Guide: Run Local AI Models With a GUI (2025)

LM Studio guide 2025 — download the desktop app (Mac/Windows/Linux), browse and install models via the built-in HuggingFace browser (Llama 3.3/Mistral/Phi-3/DeepSeek GGUF), chat with the built-in UI, configure temperature + context + GPU layers, and start a local OpenAI-compatible API server at localhost:1234. Free, private, no CLI needed.

1. What is LM Studio?

LM Studio is a free desktop GUI application for running large language models locally. Unlike Ollama (command-line), LM Studio has a visual interface for browsing, downloading, and chatting with models — no terminal commands required. Everything runs on your own hardware: no internet needed once models are downloaded, no API costs, no data leaving your machine.

Key features

Built-in model browser — search and download GGUF models from HuggingFace directly, with size and quantization filters
Built-in chat interface — ChatGPT-like UI included, no separate tool or Docker setup needed
Local API server — starts an OpenAI-compatible server at localhost:1234; drop-in replacement for OpenAI API calls
GGUF format — uses quantized GGUF models from the HuggingFace community; hundreds of models available

Platform support

Mac M1/M2/M3/M4 — fully supported via Metal/MLX; unified memory = GPU and RAM shared (16GB M1 handles 7B–13B models)
Windows — CUDA acceleration (NVIDIA GPU) or CPU inference
Linux — CPU + NVIDIA GPU (experimental)

LM Studio vs Ollama at a glance

LM Studio Ollama
Interface GUI (visual) CLI (terminal)
Model browser ✓ Built-in HF browser Manual pull commands
Chat UI ✓ Built-in Requires Open WebUI
API server ✓ localhost:1234 ✓ localhost:11434
Easiest for Non-technical users Developers
Model format GGUF GGUF + Ollama native

2. Install LM Studio (3 steps)

Installation is straightforward — LM Studio is a regular desktop application with no CLI setup required.

1

Go to lmstudio.ai and download for your platform

Navigate to lmstudio.ai and click Download. Choose your platform: Mac (Apple Silicon or Intel), Windows, or Linux. The download is around 200–400MB.

2

Open LM Studio — no installation wizard

On Mac: drag LM Studio to your Applications folder and double-click to open. On Windows: run the downloaded installer. LM Studio opens directly as a desktop application.

macOS note: If macOS blocks the app, go to System Settings → Privacy & Security and click “Open Anyway”. This is a standard gate for apps not distributed through the Mac App Store.
3

On first launch: select GPU acceleration

LM Studio will detect your hardware and suggest acceleration settings. Choose: Metal for Apple Silicon, CUDA for NVIDIA GPU (Windows/Linux), or CPU for others. You can change this later in Settings.

3. Find and download a model (4 steps)

LM Studio's built-in Discover tab connects directly to HuggingFace — browse and download models without leaving the app. Choose a model that fits your hardware (see recommendations below).

1

Click the search icon (top left) or go to the Discover tab

The Discover tab is the magnifying glass icon in the left sidebar. This opens the built-in HuggingFace model browser.

2

Search for a model

Try searching: llama-3.2, mistral-7b, phi-3-mini, or deepseek-r1. Results come from HuggingFace and show available GGUF variants.

3

Filter by size and quantization

Use the filters to show only models that fit your VRAM or RAM. LM Studio can highlight which variants fit your hardware (shown in green). Filter by format: GGUF.

4

Click Download next to a specific variant

Choose a quantization level: Q4_K_M (4-bit, recommended — best balance of quality and size), Q5_K_M (slightly better quality, larger file), or Q8_0 (near-lossless, requires more RAM). Q4_K_M is the most popular choice.

Recommended models by hardware

Hardware Recommended models
8GB RAM/VRAM Llama 3.2 3B Q4, Phi-3 Mini 3.8B Q4
16GB RAM/VRAM Llama 3.1 8B Q4, Mistral 7B Q4
32GB+ (Mac M-series) Llama 3.3 70B Q4, DeepSeek-R1 7B
64GB+ RAM Llama 3.3 70B full precision

4. Chat with a model

LM Studio includes a built-in chat interface — you don't need to install Open WebUI or any external tool. Just load a model and start typing.

1

Go to the Chat tab (speech bubble icon)

Click the chat/speech bubble icon in the left sidebar to open the chat interface.

2

Click “Select a model to load” → choose your downloaded model

A dropdown at the top of the chat shows all your downloaded models. Click it and select the model you want to use.

3

Wait for the model to load (5–30 seconds)

The model loads into memory on first use. A progress bar shows loading status. Larger models take longer; once loaded, subsequent responses are fast.

4

Type your message and press Enter

The model responds locally — streamed in real time. No internet required, no API costs, no data sent anywhere.

5

Optional: edit the System Prompt

At the top of the chat panel, you can set a System Prompt — instructions that define the model's behavior and role (e.g., “You are a helpful coding assistant. Be concise and show code examples.”)

5. Configure and optimize the model

While chatting, the right panel shows model configuration options. Adjusting these can significantly affect both quality and performance.

Context Length

How much conversation history the model can remember, measured in tokens. Range: typically 4,096 to 32,768. Higher context uses more RAM. For most chat sessions, 4,096–8,192 is sufficient.

Temperature

Controls how creative or focused the model's responses are.

0.1–0.3Coding, factual questions, structured output — more precise and deterministic
0.7–1.0Creative writing, brainstorming, conversation — more varied and imaginative

GPU Layers

How many model layers to offload to GPU. More layers = faster inference, but needs more VRAM. Set as high as your VRAM allows. LM Studio often auto-detects the right value based on your GPU.

Threads

How many CPU threads to use for inference (relevant when running on CPU or with partial GPU offloading). A good default is half your CPU's logical core count. On Mac, LM Studio handles this automatically via Metal.

6. Use LM Studio as a local API server

LM Studio includes a local server that mimics the OpenAI API format. Any code that calls OpenAI can be pointed at LM Studio with just two line changes — the API is identical.

1

Click “Local Server” (left sidebar, server icon)

The server icon looks like a network/“</„ icon in the sidebar. Click it to open the server panel.

2

Select your loaded model

Choose the model you want the server to serve. If you already loaded a model in Chat, it will appear here.

3

Click “Start Server” → server starts at http://localhost:1234

The green indicator confirms the server is running. The API base URL is http://localhost:1234/v1.

OpenAI Python SDK — change just the base URL

Use the OpenAI Python SDK — change base_url to point at LM Studio. The rest of your existing OpenAI code stays identical.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"  # required field, value doesn't matter
)

response = client.chat.completions.create(
    model="local-model",  # LM Studio ignores this and uses the loaded model
    messages=[{"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"}]
)
print(response.choices[0].message.content)

curl alternative

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "local", "messages": [{"role": "user", "content": "Hello!"}]}'

7. LM Studio vs Ollama — when to use each

Both are free, private, and offline-capable. The right choice depends on your workflow and comfort with the command line.

Choose LM Studio if…

You prefer a visual interface over the command line
You want to browse and discover models with a GUI
You don't want to set up Open WebUI separately
You're less technical or new to local LLMs

Choose Ollama if…

You're a developer who prefers CLI tools
You want to integrate local models into scripts and apps easily
You want to use it with the Continue extension in VS Code/Cursor
You prefer Docker-based deployment
Both: free, private, offline, no usage costs. Both expose an OpenAI-compatible local API — your app code can switch between them by just changing the port number.
🔔

Monitor AI tool status at Prismix

Track Ollama, LM Studio, and all your local AI tools at Prismix — know instantly whether issues are with your local setup or upstream services. Get free alerts so you're the first to know.

FAQ

What is LM Studio?

LM Studio is a free desktop application for running open-source LLMs locally on your Mac, Windows PC, or Linux machine. It provides a visual interface to browse and download models from HuggingFace, a built-in ChatGPT-like chat interface, and an OpenAI-compatible local API server at localhost:1234.

Is LM Studio free?

Yes. LM Studio is completely free to download and use. The models (Llama, Mistral, Phi, DeepSeek) are also free to download. The only cost is your hardware — you need enough RAM/VRAM to run the models you choose.

LM Studio vs Ollama — which should I use?

LM Studio is better for visual browsing and non-technical users — it has a built-in model browser, chat UI, and server without any terminal commands. Ollama is better for developers who want CLI integration, shell scripting, and use with tools like the Continue extension in VS Code. Both are free and provide a local OpenAI-compatible API.

What models can I run in LM Studio?

LM Studio supports GGUF format models from HuggingFace — including Llama 3.3, Mistral, Phi-3, DeepSeek-R1, Gemma 2, Qwen, and hundreds more. Model availability depends on your hardware — smaller quantized models (Q4_K_M, 4-bit) run on most modern machines with 8GB+ RAM.