Skip to main content

7 docs tagged with "computer-use"

View all tags

Benchmarks: WebArena and OSWorld

Understanding computer use agent benchmarks - WebArena, OSWorld, ScreenSpot, Mind2Web. Current SOTA results, what the numbers mean, and how to evaluate your own agent.

Browser Agents

Building practical browser agents using Playwright and LLMs - DOM manipulation, visual navigation, session management, anti-bot handling, and complete Python implementation.

Computer Use Architecture

How Anthropic's Computer Use API works - the screenshot-action loop, the three tools, coordinate systems, and building a working computer use agent with Docker.

GUI Automation with Vision

Vision-based GUI automation for desktop applications - coordinate grounding, UI element detection, OCR integration, state tracking, and building a desktop automation agent.

Module 03: Computer Use Agents

How AI agents see, understand, and interact with graphical interfaces - browsers, desktops, and GUIs - using vision models and action executors.

Safety and Sandboxing

Safety architecture for computer use agents - threat models, prompt injection, Docker sandboxing, action confirmation gates, logging, and anomaly detection.

Web Scraping Agents

Agent-based web scraping - handling dynamic JavaScript rendering, login flows, multi-page pagination, structured data extraction, and anti-detection techniques.