Voxel51 recently open-sourced VoxelGPT, an AI assistant that interfaces with GPT-3.5 to produce Python code for querying computer vision datasets. InfoQ spoke with Jason Corso, co-founder and CSO of Voxel51, who shared their lessons and insights gained while developing VoxelGPT.
In any data science project, data exploration and visualization is a key early step, but many of the techniques are intended for structured or tabular data. Computer vision datasets, on the other hand, are usually unstructured: images or point clouds. Voxel51's open-source tool FiftyOne provides a query language for exploring and curating computer vision datasets. However, it can be challenging for casual users to quickly and reliably use tools like FiftyOne.
VoxelGPT is a plug-in for FiftyOne that provides a natural language interface for the tool. Users can ask questions about their data, which are translated into Python code that leverages FiftyOne to produce the answers. VoxelGPT uses LangChain and prompt engineering to interact with GPT-3.5 and output the Python code. InfoQ spoke with Jason Corso, co-founder and CSO of Voxel51, about the development of VoxelGPT.
InfoQ: It's clear that while LLMs provide a powerful natural language interface for solving problems, getting the most out of them can be a challenge. What are some tips for developers?
Jason Corso: A key learning we had while developing VoxelGPT is that expecting one interaction with the LLM to sufficiently address your task is likely a bad idea. It helps to carefully segment your interaction with the LLM to sufficiently provide enough context per interaction, generate more useful piecemeal results, and later compose them together depending on your ultimate task.
A few other lessons learned:
- Start simple, gain intuition, and only add layers of complexity once the LLM is acting as you expect.
- LangChain is a great library, but it is not without its issues. Don’t be afraid to "go rogue" and build your own custom LLM tooling wherever existing tools aren’t getting the job done.
InfoQ: Writing good tests is a key practice in software engineering. What are some lessons you learned when testing VoxelGPT?
Corso: Testing applications built with LLMs is challenging, and testing VoxelGPT was no different. LLMs are not nearly as predictable as traditional software components. However, we incorporated software engineering best practices into our workflows as much as possible through unit testing.
We created a unit testing framework with 60 test cases, which covered the surface area of the types of queries we’d expect from usage. Each test consisted of a prompt, a FiftyOne Dataset, and the expected subset of the dataset resulting from converting the prompt to a query in FiftyOne’s domain-specific query language. We ran these tests each time we made a substantial change to the code or example set in order to prevent regression.
InfoQ: AI safety is a major concern. What were some of the safety issues you confronted and how did you solve them?
Corso: Yes, indeed AI safety is a key element to consider when building these systems. When building VoxelGPT, we were intentional about addressing potential safety issues in multiple ways.
Input validation: The first stop on a prompt’s journey through VoxelGPT is OpenAI’s moderation endpoint, so we ensure all queries passed through the system comply with OpenAI’s terms of service. Even beyond that, we run a custom "intent classification" routine to validate that the user’s query falls into one of the three allowed classes of query, is sensible, and is not out of scope.
Bias mitigation: Bias is another major concern with LLMs, which form potentially unwanted or non inclusive connections between concepts, based on their training data. VoxelGPT is incentivized to infer as much as possible from the contextual backdrop of the user’s FiftyOne Dataset, so that it capitalizes on the base LLM’s inference capabilities without being mired in its biases.
Programmed limitations: We purposely limited VoxelGPT’s access to any functionality involving the permanent moving, writing, or deleting of data. We also prevent VoxelGPT from performing any computationally expensive operations. At the end of the day, the human working with VoxelGPT (and FiftyOne) is the only one with this power!
InfoQ: What was one of the most surprising things you learned when building VoxelGPT?
Corso: Building VoxelGPT was really quite fun. LLMs capture a significant amount of generalizable language-based knowledge. Their ability to leverage this generalizability in context-specific ways was very surprising. What do I mean? At the heart of FiftyOne is a domain-specific language (DSL), based in Python, for querying schema-less unstructured AI datasets. This DSL enables FiftyOne users to "semantically slice" their data and model outputs to various ends like finding mistakes in annotations, comparing two models, and so on. However, it takes some time to become an expert in that DSL. It was wildly surprising that with a fixed and somewhat limited amount of context, we could provide sufficiently rich "training material" for the LLM to actually construct executable Python code in FiftyOne's DSL.
The VoxelGPT source code is available on GitHub. There is also an online demo available on the FiftyOne website.