InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News OpenAI Releases GPT-4o mini Model with Improved Jailbreak Resistance

AI, ML & Data Engineering

OpenAI Releases GPT-4o mini Model with Improved Jailbreak Resistance

This item in japanese

Jul 23, 2024 2 min read

Write & Win: InfoQ Contest

Join the contest to:

Win a conference ticket
Boost your profile
Help the community

Send your article proposal

OpenAI released GPT-4o mini, a smaller version of their flagship GPT-4o model. GPT-4o mini outperforms GPT-3.5 Turbo on several LLM benchmarks and is OpenAI's first model trained with an instruction hierarchy method that improves the model's resistance to jailbreaks and system prompt extraction.

GPT-4o mini supports the same languages and modalities as the full GPT-4o model, although currently the OpenAI API only allows text and vision, with audio and video input/output "coming in the future." The model also has the same context window, 128k tokens, and the same October 2023 training knowledge cutoff. It has the same built-in safety mitigations as GPT-4o, and in addition was trained using OpenAI's instruction hierarchy training method which gives models up to 30% better robustness against jailbreaks and 60% improved defense against system prompt extraction. On LLM benchmarks such as MMLU and HumanEval, GPT-4o mini outperforms comparable small LLMs such as Gemini Flash and Claude Haiku as well as GPT-3.5. According to OpenAI:

Over the past few years, we’ve witnessed remarkable advancements in AI intelligence paired with substantial reductions in cost...We’re committed to continuing this trajectory of driving down costs while enhancing model capabilities. We envision a future where models become seamlessly integrated in every app and on every website. GPT-4o mini is paving the way for developers to build and scale powerful AI applications more efficiently and affordably. The future of AI is becoming more accessible, reliable, and embedded in our daily digital experiences, and we’re excited to continue to lead the way.

While OpenAI has not published many technical details of the model, the company did recently publish a research paper on training models to follow an instruction hierarchy. The key idea is that many attack vectors against LLMs use the fact that "LLMs often consider system prompts to be the same priority as text from untrusted users and third parties." To address this, OpenAI developed a training dataset that teaches LLMs to ignore "lower-privileged" instructions when they conflict with higher ones.

To evaluate this method, the researchers first fine-tuned a model on the dataset then tested it on a set of both open-source attack benchmarks and proprietary ones. The fine-tuned model showed improved robustness on all benchmarks. The team did notice, however, that the model tended to "over-refuse" on some benchmarks, but they said they do not expect this "to cause noticeable degradations in model behavior" for real-world use cases.

OpenAI CEO Sam Altman posted on X that the company's best model in 2022, text-davinci-003, was "much, much worse" than GPT-4o mini. Also on X, the LMSYS team revealed that:

GPT-4o mini's early version "upcoming-gpt-mini" was tested in Arena in the past week. With over 6K user votes, we are excited to share its early score reaching GPT-4-Turbo performance, while offering significant cost reduction.

However, Wharton professor Ethan Mollick wrote:

First impressions with GPT-4o-mini (what a name) is that it is impressive for a small model but no replacement for a frontier model. When given complex education prompts it can’t follow instructions as well & misses nuance GPT-4o nails.

GPT-4o mini is available via the OpenAI API as well as in ChatGPT.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

This content is in the AI, ML & Data Engineering topic

Write Your Way to a QCon or InfoQ Dev Summit!

Join the InfoQ article competition to win a complimentary ticket to QCon or InfoQ Dev Summit! We're seeking in-depth technical articles written by software developers for software developers.

Send your proposal