InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Berkeley Open-Sources AI Image-Editing Model InstructPix2Pix

AI, ML & Data Engineering

Berkeley Open-Sources AI Image-Editing Model InstructPix2Pix

Jul 18, 2023 2 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

Researchers from the Berkeley Artificial Intelligence Research (BAIR) Lab have open-sourced InstructPix2Pix, a deep-learning model that follows human instructions to edit images. InstructPix2Pix was trained on synthetic data and outperforms a baseline AI image-editing model.

The BAIR team presented their work at the recent IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023. They first generated a synthetic training dataset, where the training examples are pairs of images along with an editing instruction to convert the first image to the second. This dataset was used to train an image generation diffusion model. The result is a model that can, given a source image, accept text-based instructions on how to edit the image; for example, given an image of a person riding a horse and the prompt "Have her ride a dragon," it will output the original image with the horse replaced by a dragon. According to the BAIR researchers:

Despite being trained entirely on synthetic examples, our model achieves zero-shot generalization to both arbitrary real images and natural human-written instructions. Our model enables intuitive image editing that can follow human instructions to perform a diverse collection of edits: replacing objects, changing the style of an image, changing the setting, the artistic medium, among others.

Earlier efforts in AI for image editing have often been based on style transfer, and popular text-to-image generation models such as DALL-E and Stable Diffusion also support image-to-image style transfer operations; however, targeted editing with these models is challenging. More recently, InfoQ covered Microft's Visual ChatGPT which can invoke external tools for editing images, given a textual description of the desired edit.

To train InstructPix2Pix, BAIR first created a synthetic dataset. To do this, the team fine-tuned GPT-3 on a small dataset of human-written examples consisting of an input caption, editing instructions, and a desired output caption. Then this fine-tuned model was given a large dataset of input image captions, from which it generated over 450k edits and output captions. The team then fed the input and output captions to a pre-trained Prompt-to-Prompt model, which generated pairs of similar images based on the captions.

InstructPix2Pix Architecture

InstructPix2Pix Architecture. Image Source: https://arxiv.org/abs/2211.09800

Given this dataset, the researchers trained InstructPix2Pix, which is based on Stable Diffusion. To evaluate its performance, the team compared its output with a baseline model, SDEdit. They used a tradeoff between two metrics: consistency, which is the cosine similarity between the CLIP embeddings of the input image and the edited image; and directional similarity, or how much the change in the edited caption agrees with the change in the edited image. In experiments, for a given value of directional similarity, InstructPix2Pix produced more consistent images than did SDEdit.

In his deep-learning newsletter The Batch, AI researcher Andrew Ng commented on InstructPix2Pix:

This work simplifies — and provides more coherent results when — revising both generated and human-made images. Clever use of pre-existing models enabled the authors to train their model on a new task using a relatively small number of human-labeled examples.

The InstructPix2Pix code is available on GitHub. The model and a web-based demo are available on Huggingface.

About the Author

Anthony Alford

Anthony is a Senior Director, Development at Genesys where he is working on several AI and ML projects related to customer experience. He has over 20 years experience in designing and building scalable software. Anthony holds a Ph.D. degree in Electrical Engineering with specialization in Intelligent Robotics Software and has worked on various problems in the areas of human-AI interaction and predictive analytics for SaaS business optimization.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Berkeley Open-Sources AI Image-Editing Model InstructPix2Pix

Write for InfoQ

About the Author

Anthony Alford

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter