InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

Enter your e-mail address

Select your country

We protect your privacy.

InfoQ Homepage News Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

AI, ML & Data Engineering

Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

This item in japanese

Oct 23, 2024 3 min read

Write for InfoQ

Feed your curiosity. Help 550k+ global
senior developers
each month stay ahead.Get in touch

In her presentation at the inaugural edition of InfoQ Dev Summit Munich, Ines Montani built on top of the presentation she had earlier this year at QCon London and provided the audience with practical solutions for using the latest state-of-the-art models in real-world applications and distilling their knowledge into smaller and faster components that you can run and maintain in-house.

She began by stating that using black box models hidden behind APIs would prevent us from satisfying the properties of good software: modular, transparent, explainable, data-private, reliable, and affordable.

Further, Montani pointed out that GenAI can be helpful in multiple situations where we need to interpret human language ("often the language can be vague"), such as evaluating the comments customers leave on forums about a product. She stressed that you don’t need the whole capability of the foundational model ("you don’t want to speak about the weather with it") but just understanding the context. This can be accomplished by using transfer learning to distil task-specific information.

To pass the "prototype plateau" and make the system "production ready", according to Montani the following actions are needed:

Standardise inputs and outputs—the prototype and the targeted production system must have the same data types.
Start with the evaluation—this is like the equivalent of tests in software development. You want responses for which you know the answer. This way, you will see if the system is improving or not based on accuracy scores.
Assess the utility, not just the accuracy - besides the accuracy scores, you have to check the utility of the model ("Is it useful in what you are doing?").
Work on data iteratively - similar to coding practices, you should try different approaches and tools until you find the most suitable one.
Consider the structure and ambiguity of natural language - extracts from human language cannot be handled like standard data ("it doesn’t fit in neatly arranged boxes").

When working on a prototype that involves natural language processing (NLP), a good starting point is a large language model (LLM) that can be prompted by a tool which then parses the output, providing an object containing the structured data("that’s why we built spaCy LLM"). And, even if it can be deployed like that in production, a better approach would be to replace the LLM at runtime with a distilled task-specific component that performs just the parts you need. This way, the system will be more modular, transparent and (probably) faster.

Further, you can surpass the LLM output’s quality by correcting its mistakes by adding a "human in the loop". After defining the baseline (the out-of-the-box result), fix the prompts and pass them through an annotation tool to create a data set that is very specific to the targeted task. To "access the human" efficiently, you can have multiple passes through the data, focusing on only one aspect each time. By doing this, the cognitive load will be lowered, increasing the speed.

Montani: As developers, we need to ship things and not get stuck in the Prototype Plateau. You are allowed to make your problem more manageable. This is not a competition. This is not academia, and having less complexity means less can go wrong.

The distillation process can be thought of as code refactoring. Techniques like breaking down the problem into smaller ones, decreasing its complexity, and separating the business logic specifics from the particularities of your system can be used. During this phase, you can reassess the dependencies and techniques used, making sure the most suitable ones for the task are used.

To underline the multifaceted benefits of distilling the models used in natural language processing-based applications, Montani provided summaries of case studies in which explosion.ai assisted customers from multiple fields. As pointed out, the final model was usually smaller and more accurate than the initial baseline provided by the LLM. Hence, taking the time to iterate over your models will give far better results in the long run and decrease the operational cost as well.

About the Author

Olimpiu Pop

Tech Executive and Engineer Focused on a Holistic Approach and using technology to provide solutions to real problems with minimal impact on the environment. He has experience in developing real-time applications ranging from financial software to IAM. Passionate about tooling and optimising development flows with or without AI. Led and shaped technical organisations of hundreds of developers (from support engineers to Architects). Tech community builder: Transylvania JUG facilitator, member of the program committee for Voxxed Romania and Devoxx UK, conference speaker and podcaster on cybersecurity and open-source topics for 505updates.com. Main editor and troublemaker of JavaAdventCalendar.

Show moreShow less

The InfoQ Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

We protect your privacy.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

Write for InfoQ

About the Author

Olimpiu Pop

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter