Microsoft Research open sourced LMOps, a collection of tools for improving text prompts used as input to generative AI models. The toolkit includes Promptist, which optimizes a user's text input for text-to-image generation, and Structured Prompting, a technique for including more examples in a few-shot learning prompt for text generation.
Generative text-to-image models such as Stable Diffusion are often able to produce impressive results directly from user input. However, researchers have shown that prompt engineering can usually improve the results. These techniques involve modifying the text input by including suggestions for an art style or features such as lighting. For example, instead of simply passing "Cats dancing in a space club" as a prompt, an engineered prompt might be "Cats dancing in a space club, digital painting, artstation, concept art, soft light, hdri, smooth, sharp focus, illustration, fantasy."
Promptist Training Architecture. Image Source: https://github.com/microsoft/LMOps
With Promptist, Microsoft's researchers trained an additional language model (LM) that optimizes text prompts for text-to-image generation. First, the team fine-tuned a pretrained LM using supervised learning on a collection of manually optimized prompts. The model was then further trained using reinforcement learning (RL). For the RL reward function, the team used the modified prompts as input to a text-to-image generator, and evaluated the resulting images on "relevance and aesthetics" using CLIP. In experiments using the final model, human judges preferred the images resulting from the modified prompt over those from their original prompt, a majority of the time.
Generative LMs such as GPT-3 can perform quite well on natural language processing (NLP) tasks such as question answering. Because these models work by predicting the next values in a sequence of text, they often perform better when given examples of the task. For example, in a translation task, the model may be prompted with the instruction "translate English to French," followed by some short translation examples followed by a final piece of English text. The model will then predict the French translation of that text.
The amount of examples given in the input prompt is limited by the maximum input sequence that the LM can accept, usually on the order of a few thousand words. Microsoft's Structured Prompting addresses this limitation, allowing thousands of examples, by first concatenating examples into groups, then inputting each group into the LM. The hidden key and value vectors of the LM's attention modules are cached. Finally, when the user's unaltered input prompt is passed to the LM, the cached attention vectors are injected into the hidden layers of the LM. The researchers found that this technique "outperforms the conventional approach" on several NLP tasks.
In a discussion about Structured Prompting on Twitter, one user pointed out that this technique wouldn't work with OpenAI's closed models. AI developer Jay Hack replied:
That’s right. You need to access [keys] and [values] in the transformer internals, which they do not expose. You could implement yourself on OSS ones like BLOOM or T5 though.
The code for Structured Prompting is available on GitHub. An online demo of Promptist is available on HuggingFace. The LMOps repo also states that research on Knowledge Augmentation is "TBA."