OpenAI announced several feature additions and price reductions across its platform at its first Developer Day. The updates include a new GPT-4 Turbo model, an Assistants API, and multimodal capabilities.
The GPT-4 Turbo model is an upgrade from the first version of GPT-4, which was released in March and made generally available to all developers in July. The new model is more capable, cheaper, and supports a 128K context window. It has knowledge of world events up to April 2023 and can fit the equivalent of more than 300 pages of text in a single prompt.
OpenAI has also announced lower prices across the platform. GPT-4 Turbo input tokens are 3x cheaper than GPT-4 at $0.01, and output tokens are 2x cheaper at $0.03. GPT-3.5 Turbo input tokens are 3x cheaper than the previous 16K model at $0.001, and output tokens are 2x cheaper at $0.002.
According to Satya Nadella:
"The first thing we have been doing in partnership with you is building the system," says Nadella. "The shape of Azure is drastically changing in support of this," he noted. "We want to build our co-pilot as developers on OpenAI API."
OpenAI has also introduced function calling updates, which allow developers to describe functions of their app or external APIs to models. The models can then intelligently choose to output a JSON object containing arguments to call those functions. This feature has been improved to allow for the calling of multiple functions in a single message, and GPT-4 Turbo is more likely to return the correct function parameters.
The new Assistants API is designed to help developers build agent-like experiences within their applications. An assistant is a purpose-built AI with specific instructions, leverages extra knowledge, and can call models and tools to perform tasks. The Assistants API provides new capabilities such as Code Interpreter and Retrieval, and function calling to handle a lot of the heavy lifting developers previously had to do themselves. The Assistants API is in beta and is now available to all developers.
Some raised concerns that OpenAI's announced store may allow it to incorporate promising third-party innovations into its own platform once they gain traction. The proprietary nature of the platform evokes concerns about vendor lock-in and lack of portability to other systems.
As stated by Andrej Karpathy:
With the newly announced GPTs, I think we’re seeing a new (still a bit primordial) layer of abstraction in computing. There will be a lot more developers, and a lot more GPTs. GPTs that can read, write, hear, speak, see, paint, think, use existing computing as tools, become experts in focus areas, reference custom data, take actions in the digital world, speak or act in custom ways, and collaborate together.
OpenAI has also introduced new multimodal capabilities in the platform, including vision, image creation (DALL·E 3), and text-to-speech (TTS). GPT-4 Turbo can accept images as inputs in the Chat Completions API, enabling use cases such as generating captions, analyzing real-world images in detail, and reading documents with figures. Developers can integrate DALL·E 3 directly into their apps and products through the Images API.
The new TTS model offers six preset voices and two model variants, tts-1 and tts-1-hd. In a related development, OpenAI launched the next version of its open source automatic speech recognition model, Whisper large-v3. The company claims that this new version offers improved performance across languages.
One thing I missed from yesterday: the cost of storing data in an OpenAI assistant for retrieval question answering etc is VERY steep: $0.20/GB/assistant/day - Simon Willison
OpenAI is also introducing Copyright Shield, a new feature that will defend customers and pay the costs incurred if they face legal claims around copyright infringement. This applies to generally available features of ChatGPT Enterprise and the developer platform.
Developers interested in learning more about the developments can watch the keynote or refer to documentation on OpenAI's website.