Nvidia's new NeMo Guardrails package for large language models (LLMs) helps developers prevent LLM risks like harmful or offensive content and access to sensitive data. This innovation is crucial for developers as it offers multiple features to control the behavior of these models, thereby ensuring their safer deployment. Specifically, NeMo Guardrails helps mitigate the risks of LLMs generating harmful or offensive content thereby providing an essential layer of protection in an increasingly AI-driven landscape.
NeMo Guardrails helps developers to mitigate the risks associated with LLMs by providing a number of features to control the behavior of these models. The package is built on Colang, a modeling language and runtime developed by Nvidia for conversational AI. "If you have a customer service chatbot, designed to talk about your products, you probably don't want it to answer questions about our competitors," said Jonathan Cohen, Nvidia vice president of applied research. "You want to monitor the conversation. And if that happens, you steer the conversation back to the topics you prefer".
NeMo Guardrails currently supports three broad categories: Topical, Safety, and Security. Topical guardrails ensure that conversations stay focused on a particular topic. Safety guardrails ensure that interactions with an LLM do not result in misinformation, toxic responses, or inappropriate content. They also enforce policies to deliver appropriate responses and prevent hacking of the AI systems. Security guardrails prevent an LLM from executing malicious code or calls to an external application in a way that poses security.
Guardrails features a sandbox environment, allowing developers the freedom to experiment with AI models without jeopardizing production systems, thus reducing the risk of generating harmful or offensive content. Additionally, a risk dashboard is provided, which consistently tracks and scrutinizes the use of AI models, assisting developers in identifying and mitigating potential risks before they lead to major issues. Moreover, it supplies a clear set of policies and guidelines designed to direct the usage of AI within organizations.
Reception has generally been positive about NeMo-Guardrails, but some have expressed caution around the limitations. There are certain limitations and constraints developers need to be aware of when using this LLM package. Karl Freund of Cambrian-AI Research writes "Guardrails could be circumvented or otherwise compromised by malicious actors, who could exploit weaknesses in the system to generate harmful or misleading information". Jailbreaks, hallucinations, and other issues also remain active research areas which no current system has implemented fullproof protection against.
Other tools also exist for safety when working with large language models. For example, Language Model Query Language (LMQL) is designed to make natural language prompting and is built on top of Python. Microsoft's Guidance framework can also be used for addressing issues with LLMs not guaranteeing that output follows a specific data format.
Nvidia advises that Guardrails works best as a second line of defense, suggesting that companies developing and deploying chatbots should still train the model on a set of safeguards with multiple layers.