InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Anthropic Releases New Claude Models and Computer Use Feature

AI, ML & Data Engineering

Anthropic Releases New Claude Models and Computer Use Feature

Nov 12, 2024 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Anthropic released two new models: Claude 3.5 Haiku and an improved version of Claude 3.5 Sonnet. They also released a new feature for Claude 3.5 Sonnet that allows the model to interact with a computer's GUI the same way a human user does.

Claude 3.5 Haiku is the company's fastest model; the new version outperforms larger models such as GPT-4o and the previous generation of Claude 3.5 Sonnet on the SWE-bench Verified coding benchmark. The upgraded Claude 3.5 Sonnet performs even better on that benchmark, "higher than all publicly available models" according to Anthropic. The model also supports a new feature, computer use, which allows it to interact with a computer by interpreting the images on the screen, moving the mouse pointer, clicking buttons, and entering text via a virtual keyboard. This allows the model to interact with virtually any program, not just ones that support an API. According to Anthropic,

Computer use is a completely different approach to AI development. Up until now, LLM developers have made tools fit the model, producing custom environments where AIs use specially-designed tools to complete various tasks. Now, we can make the model fit the tools—Claude can fit into the computer environments we all use every day. Our goal is for Claude to take pre-existing pieces of computer software and simply use them as a person would.

The computer use feature relies on Claude's ability to interpret images. Anthropic describes it as "taking screenshots and piecing them together." One key advancement was training the model to accurately count pixels; many LLMs struggle with similar tasks such as counting the number of letters in a word. Without this skill, the model would be unable to move the computer mouse to the proper place.

Claude currently has the top spot on the OSWorld benchmark leaderboard, which tracks the ability of AI agents to interact with computers. While humans typically score higher than 70% on this benchmark, Claude's best score is 14.9%. However, GPT-4, "the next-best AI model in the same category" according to Anthropic, scores only 7.7%.

Users on Hacker News discussed the computer use feature, pointing out its potential for automating a wide range of common business processes.

This is actually a huge deal. As someone building AI SaaS products, I used to have the position that directly integrating with APIs is going to get us most of the way there in terms of complete AI automation...I started to realize that pretty much most of the real world runs on software that directly interfaces with people, without clearly defined public APIs you can integrate into...I am glad they did this, since it is a powerful connector to these types of real-world business use cases that are super-hairy, and hence very worthwhile in automating.

Anthropic notes that the feature still "remains slow and often error-prone." Alex Albert, the company's head of Claude relations, posted on X that:

It's not perfect yet. The model struggles at times with basic computer actions which can lead to some amusing moments. While filming demos, Claude accidentally stopped a long-running screen recording, causing all footage to be lost. Later, Claude took a break from the coding demo and began to browse photos of Yellowstone National Park.

The computer use feature is currently in public beta. Anthropic also released example code on GitHub demonstrating how to use the feature.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Anthropic Releases New Claude Models and Computer Use Feature

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter