Claude Computer Use: Guide to AI Desktop Automation (2025)
Claude Computer Use lets AI control your desktop — clicking, typing, and navigating apps by seeing the screen through screenshots. Currently in public beta via the Anthropic API. This guide covers the setup, a working Python example, real use cases, and what to watch out for.
1. What is Claude Computer Use?
Claude Computer Use is an Anthropic API feature that enables Claude to operate a computer by taking screenshots, analyzing what it sees, and executing actions — clicking, typing, scrolling, and pressing keyboard shortcuts. Unlike browser automation tools (Selenium, Playwright) that hook into the DOM, Computer Use operates at the pixel level, interacting with any visual UI the same way a human would.
How it differs from browser automation
| Feature | Claude Computer Use | Playwright / Selenium |
|---|---|---|
| Scope | Any app (desktop + web) | Web browsers only |
| Interaction model | Pixel-level screenshots | DOM selectors |
| Speed | 2-5 sec/action | <100ms/action |
| Setup | API key + Python | Driver + selectors |
2. How it works — the screenshot → action loop
Claude Computer Use operates in a tight loop. Each iteration has four steps:
Screenshot
Your code captures the current screen state as an image and sends it to Claude via the API with the task description.
Reasoning
Claude analyzes the screenshot, understands the current UI state, and decides what action would make progress toward the goal.
Tool call
Claude returns a tool call: left_click, right_click, type, scroll, key, or screenshot. Your code executes it on the actual desktop.
Result + repeat
The result (usually another screenshot) is sent back to Claude. Loop continues until Claude signals end_turn or max steps is reached.
3. Setting up Claude Computer Use
Three things you need before writing a single line:
1. Anthropic API key
Get one at console.anthropic.com. Computer Use is available on all paid plans. Store it in an environment variable: ANTHROPIC_API_KEY=sk-ant-...
2. Python packages
Install the Anthropic SDK and screen control libraries:
pip install anthropic mss pyautogui 3. Screen resolution
You must pass your actual screen dimensions to the computer_20241022 tool. Mismatched dimensions cause click coordinate errors.
4. Python code example — basic screenshot + action loop
This is a minimal but complete implementation of the Computer Use loop. It handles screenshots, clicks, and typing. Extend it with additional action types (scroll, key, drag) as needed.
import anthropic
import base64
client = anthropic.Anthropic()
# Take a screenshot and send it to Claude Computer Use
def get_screenshot():
# In practice: use pyautogui, mss, or your platform's screenshot tool
# This returns raw bytes of the screen image
import mss
with mss.mss() as sct:
screenshot = sct.grab(sct.monitors[0])
return bytes(screenshot.rgb)
def computer_use_loop(task: str, max_steps: int = 10):
messages = [{"role": "user", "content": task}]
for step in range(max_steps):
response = client.beta.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=[
{
"type": "computer_20241022",
"name": "computer",
"display_width_px": 1920,
"display_height_px": 1080,
"display_number": 1,
}
],
messages=messages,
betas=["computer-use-2024-10-22"],
)
# Check if Claude is done
if response.stop_reason == "end_turn":
print("Task complete")
break
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use" and block.name == "computer":
action = block.input["action"]
print(f"Action: {action}")
if action == "screenshot":
# Capture screen and return to Claude
img_bytes = get_screenshot()
img_b64 = base64.b64encode(img_bytes).decode()
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": [{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": img_b64,
}
}]
})
elif action == "left_click":
# Execute the click (implement with pyautogui)
import pyautogui
coords = block.input["coordinate"]
pyautogui.click(coords[0], coords[1])
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "Click executed"
})
elif action == "type":
import pyautogui
pyautogui.typewrite(block.input["text"], interval=0.05)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "Text typed"
})
# Add Claude's response and tool results to message history
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
# Example usage
computer_use_loop("Open the calculator app and compute 847 * 293") 5. Use cases — what Claude Computer Use is good at
Computer Use is best for tasks that are repetitive, UI-bound, and hard to automate with traditional scripting because they lack APIs.
Form filling: Fill in web forms, government portals, or legacy enterprise systems that don't have APIs. Claude can handle dropdowns, date pickers, and multi-step forms.
Web scraping with login walls: Navigate login pages, bypass cookie consent dialogs, and extract data from authenticated web apps where headless browsers struggle with 2FA.
UI testing: Run exploratory QA testing on desktop apps — Claude can describe bugs it encounters visually, making it useful for accessibility and visual regression testing.
Data entry workflows: Transfer data between systems that don't integrate — copy from a spreadsheet, paste into a CRM, verify each entry. Especially useful for legacy internal tools.
Desktop app automation: Control native apps (Figma, Excel, Premiere) with natural language instructions — useful for generating repetitive assets or batch operations.
6. Current limitations
Claude Computer Use is powerful but has real constraints you need to plan around in 2025:
Latency: Each screenshot-action cycle takes 2-5 seconds. A 20-step task takes 40-100 seconds minimum. Not suitable for real-time interactions.
No real-time streaming: Claude sees the screen as static snapshots, not a video stream. Animations, loading spinners, and dynamic content changes can confuse the loop.
2FA challenges: SMS codes, authenticator apps, and email verification can break the automation loop since Claude can't receive or retrieve those codes independently.
Rate limits: Beta features have tighter rate limits than standard API calls. Expect 429 errors on long loops; implement exponential backoff.
Coordinate drift: If the screen resolution or DPI doesn't exactly match what you told the API, click coordinates can be off by pixels — test carefully on Retina/HiDPI displays.
7. Safety considerations
Claude Computer Use operates on your real desktop with real permissions. Anthropic's guidelines and practical experience both point to these safety practices:
Human-in-the-loop for destructive actions: Before Claude deletes files, submits forms, or sends messages, pause and ask for user confirmation. Add a confirmation prompt in your loop before high-impact tool calls.
Run in a VM or container: For production use cases, run the Computer Use agent inside a virtual machine (VirtualBox, VMware, or a Dockerized VNC session). This limits the blast radius if something goes wrong.
Avoid banking and email apps: Anthropic recommends against using Computer Use with banking, email, or any app that could transfer money, send messages, or access sensitive credentials. The potential for prompt injection from web content is real.
Set a max step limit: Always set a hard cap on the number of loop iterations. An unbounded loop with a misconfigured task can run indefinitely and rack up significant API costs.
Monitor Claude API status
Computer Use automation loops break silently when the Anthropic API has an incident. Track Claude API uptime at prismix.dev and get a free email alert before your scheduled automations fail.
FAQ
What is Claude Computer Use?
Claude Computer Use is an Anthropic API feature that lets Claude control a computer by taking screenshots, clicking, typing, and scrolling. It works in a loop: Claude sees the screen, decides what action to take, executes it via tool calls, then sees the result in the next screenshot. Currently in public beta.
Is Claude Computer Use safe?
Claude Computer Use is safe when used with human oversight and limited permissions. Anthropic recommends: running in a sandboxed VM or container, keeping a human in the loop for confirmation before destructive actions, avoiding use with banking, email, or credential-sensitive apps, and setting strict permission scopes.
What API model supports Claude Computer Use?
Claude Computer Use is supported by claude-opus-4-5 and claude-sonnet-4-5. You must include the 'computer-use-2024-10-22' beta header in your API request and add the 'computer_20241022' tool to your tools array.
What are the limitations of Claude Computer Use?
Current limitations: no real-time video streaming (screenshot-by-screenshot only), latency of 2-5 seconds per action, 2FA and captcha challenges can break loops, rate limits on beta features, no support for system-level hardware access, and difficulty with apps that use non-standard UI frameworks.