After it was reported that OpenAI has started rolling out its new GPT Store, it was also discovered that some of the data they’re built on is easily exposed. Multiple groups have begun finding that the system has the potential to leak otherwise sensitive information.
Researchers from Northwestern University wrote "through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections." Prompt injection attacks are a known vulnerability vector to extract sensitive information or manipulate the model's output. Prompt injection is just one at vulnerability for language models, with some others including prompt leaking and jailbreaking.
Prompt injection refers to the technique where an attacker crafts specific inputs or 'prompts' to manipulate the behavior of Large Language Models (LLMs) like GPTs. In their research, the team found that through prompt injection, "an adversary can not only extract the customized system prompts, but also access uploaded files."
Yesterday I created a custom GPT for http://Levels.fyi with a limited subset of our data as a knowledge source (RAG). The feedback was incredible and folks got super creative with some of their prompts. However, I found out shortly that the source data file was leaking. - Zuhayeer Musa
"Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models." They emphasize that while customization opens up new frontiers in AI utility by allowing individuals and organizations to create AI models tailored for specific needs without necessitating coding skills, it also introduces additional vectors for potential security vulnerabilities.
"We identified key security risks related to prompt injection and conducted an extensive evaluation... Our tests revealed that these prompts could almost entirely expose system prompts and retrieve uploaded files from most custom GPTS.” This indicates a significant vulnerability in current custom GPTs regarding system prompt extraction and file disclosure.
The researchers conclude their paper with an urgent call to action: "Our findings highlight the urgent need for enhanced security measures in the rapidly evolving domain of customizable AI, and we hope this sparks further discussion on the subject." They stress that as AI technologies continue to evolve, it is crucial to strike a balance between innovation and security.
Adversa AI also recently showed that GPTs may be tricked into leaking how they were built, including prompts, API names, and uploaded document metadata and content. OpenAI noted it had patched the vulnerabilities reported by the Adversa AI researchers.
“We’re constantly working to make our models and products safer and more robust against adversarial attacks, including prompt injections, while also maintaining the models’ usefulness and task performance.” - OpenAI to Wired
Developers interested in learning more about the work from Northwestern University may read the technical paper on Arxiv or follow reaction on social media.