BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

In her presentation at the inaugural edition of InfoQ Dev Summit Munich, Ines Montani built on top of the presentation she had earlier this year at QCon London and provided the audience with practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

She starts her presentation by stating that using black box models hidden behind APIs would prevent us from satisfying the properties of good software: modular, transparent, explainable, data-private, reliable, and affordable.

Further, she pointed out that GenAI can be helpful in multiple situations where we need to interpret human language ("often the language can be vague"), such as evaluating the comments customers leave on forums about a product. Montani stressed that you don’t need the whole capability of the foundational model ("you don’t want to speak about the weather with it") but just understanding the context. This can be accomplished by using transfer learning to distil task-specific information.

 

To pass the "prototype plateau" and make the system "production ready", according to Montani the following actions are needed:

  • Standardise inputs and outputs—the prototype and the targeted production system must have the same data types.
  • Start with the evaluation—this is like the equivalent of tests in software development. You want responses for which you know the answer. This way, you will see if the system is improving or not based on accuracy scores.
  • Assess the utility, not just the accuracy - besides the accuracy scores, you have to check the utility of the model ("Is it useful in what you are doing?")
  • Work on data iteratively - similar to coding practices, you should try different approaches and tools until you find the most suitable one.
  • Consider the structure and ambiguity of natural language - extracts from human language cannot be handled like standard data ("it doesn’t fit in neatly arranged boxes")

When working on a prototype that involves natural language processing (NLP), a good starting point is a large language model (LLM) that can be prompted by a tool which then parses the output, providing an object containing the structured data("that’s why we built spaCy LLM"). And, even if it can be deployed like that in production, a better approach would be to replace the LLM at runtime with a distilled task-specific component that performs just the parts you need. This way, the system will be more modular, transparent and (probably) faster.

Further, you can surpass the LLM output’s quality by correcting its mistakes by adding a "human in the loop". After defining the baseline (the out-of-the-box result), fix the prompts and pass them through an annotation tool to create a data set that is very specific to the targeted task. To "access the human" efficiently, you can have multiple passes through the data, focusing on only one aspect each time. By doing this, the cognitive load will be lowered, increasing the speed.

Montani: As developers, we need to ship things and not get stuck in the Prototype Plateau. You are allowed to make your problem more manageable. This is not a competition. This is not academia, and having less complexity means less can go wrong.

The distillation process can be thought of as code refactoring. Techniques like breaking down the problem into smaller ones, decreasing its complexity, and separating the business logic specifics from the particularities of your system can be used. During this phase, you can reassess the dependencies and techniques used, making sure the most suitable ones for the task are used.

To underline the multifaceted benefits of distilling the models used in natural language processing-based applications, Montani provided summaries of case studies in which explosion.ai assisted customers from multiple fields. As pointed out, the final model was usually smaller and more accurate than the initial baseline provided by the LLM. Hence, taking the time to iterate over your models will give far better results in the long run and decrease the operational cost as well.

About the Author

Rate this Article

Adoption
Style

BT