InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News OpenAI Releases Operator, an AI Agent for Web-Based Tasks

AI, ML & Data Engineering

OpenAI Releases Operator, an AI Agent for Web-Based Tasks

This item in japanese

Feb 18, 2025 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

OpenAI released a research preview of Operator, an AI agent that can use a web browser to perform tasks on a user's behalf. Operator achieves new state-of-the-art performance on the WebArena and WebVoyager benchmarks.

To build Operator, OpenAI developed a new model called Computer-Using Agent (CUA), which is derived from GPT-4o. It relies on GPT-4o's vision ability to understand the contents of a browser screen, and it is further trained to interact with GUI elements like buttons and menus. To perform a task, it iteratively loops through a series of perception, reasoning, and acting steps until the task is complete. OpenAI has built in several safety guardrails: for example, Operator will require the user to take over when entering passwords, and it will refuse some high-risk tasks such as banking transactions. According to OpenAI:

We have made significant progress in deep reasoning through the o-model series, vision capabilities through GPT-4o, and new techniques to improve robustness through reinforcement learning and instruction hierarchy. The next challenge space we plan to explore is expanding the action space of agents. The flexibility offered by a universal interface addresses this challenge, enabling an agent that can navigate any software tool designed for humans. By moving beyond specialized agent-friendly APIs, CUA can adapt to whatever computer environment is available—truly addressing the "long tail" of digital use cases that remain out of reach for most AI.

In late 2024, InfoQ covered Anthropic's release of the Computer Use feature, which allows their Claude model to interact with a computer by interpreting the images on the screen, moving the mouse pointer, clicking buttons, and entering text via a virtual keyboard. Claude set records on several OS and web use benchmarks, but Operator outperforms it on WebArena, WebVoyager, and OSWorld. However, Operator still falls short of human performance on these tasks: for example, it scores 38.1% on OSWorld vs. over 70% for humans.

CUA Benchmark Scores. Image Source: OpenAI's CUA Report

Because Operator can take actions on websites, OpenAI added several safety measures beyond those already built into GPT-4o. Particularly important are the safeguards against adversarial attacks by malicious websites, including prompt injection and phishing. OpenAI used red-teams to test the safeguards, and claim that their mitigation against prompt injection worked in "all but one case."

AI researcher and entrepreneur Andrej Karpathy wrote about Operator on X:

Projects like OpenAI’s Operator are to the digital world as humanoid robots are to the physical world. One general setting (monitor keyboard and mouse, or human body) that can in principle gradually perform arbitrarily general tasks, via an I/O interface originally designed for humans. In both cases, it leads to a gradually mixed-autonomy world, where humans become high-level supervisors of low-level automation. A bit like a driver monitoring the Autopilot. This will happen faster in the digital world than in the physical world because flipping bits is somewhere around 1000X less expensive than moving atoms. Though the market size and opportunity feels a lot bigger in the physical world.

Operator is only available via the web for ChatGPT Pro users. OpenAI intends to expand this to other paid ChatGPT plans "once we are confident in its safety and usability at scale," and to make the underlying CUA model available via API.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

This content is in the AI, ML & Data Engineering topic

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

OpenAI Releases Operator, an AI Agent for Web-Based Tasks

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Popular in AI, ML & Data Engineering

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter