During her keynote at Devoxx UK, Mhairi Aitken, ethics fellow at The Alan Turing Institute, talked about the limitations of AI when it tries to grapple with the complexities of human language, giving as an example the mispronunciation by AI of her name of Gaelic origin. She began the presentation by explaining that her work at The Alan Turing Institute focuses on anticipating the social and ethical risks of Large Language Models in society. More than that, she tries to understand how those risks can be minimized and the value that data and AI can provide across society when designed and developed responsibly.
Next, Aitken provided a brief overview of what ChatGPT is for the very few who hadn't heard about it before:
It’s a fairly advanced, sophisticated version of a predictive text program. So like the predicted text on my phone that can’t spell my name right. It is based on a large language model [...] built on huge data sets of human language. They’re trained to recognise patterns in that language to predict what combination of words would be a convincing response, mimicking human language…
ChatGPT was trained on a large portion of the Internet. And with the advent of GPT-3.5 and GPT4, news about breakthroughs in AI happens multiple times per day. There is a rush to introduce the benefits of LLMs in multiple industries, from providing advice on customer support channels and legal or medical advice to writing haikus, poems or even witty, catchy lines on dating applications. Another surprising ability is working with code, both to generate it and to analyze it to find faults.
Aitken stated that regardless of the industry it generates content for, regardless of how convincing the content is, "it has absolutely no idea what those words or constructions mean [...] absolutely no way of understanding the context or the significance of those words. So it's all style, no substance".
ChatGPT is a foundation for building new systems - LLMs are foundation models in AI, meaning they are trained on huge data sets, not for particular tasks or functions. Still, they are trained for general purposes, which can be applied in various contexts. In the end, the users interacting with the software built on top of the models might not even know what’s under the hood at its base. She underlines the importance of inspecting the foundations like a construction inspector to know if what it’s being built for is fit for purpose before building on top of it:
What construction materials have been used?
In AI, a model trained on biased or incomplete data will generate biased or incomplete outcomes. The same applies to LLMs, and as ChatGPT was trained on the Internet, we need to ask ourselves how much the Internet mirrors society. According to Aitken, the internet perspective is a very skewed view of the world: economic or politically powerful voices dominate. Minority groups are usually even less represented.
Is this model suitable for the purposes? Has it been designed for these particular conditions? Will it withstand extreme events?
Making another parallel to construction, Aitken asks rhetorically whether a general-purpose building foundation can similarly support houses, hospitals or prisons. As LLMs weren’t considered for particular domains, we need to consider the different sensitivities and risks of each domain it applies to. We need to ask ourselves if it is fit for purpose for the particular industry we are interested in. In the case of constructions, we need to understand whether they can withstand earthquakes, hurricanes or fires; in the case of LLMs, they should withstand cyberattacks and malicious actors.
Has it been built using ethical labour practices?
ChatGPT relies on its moderation API, which to a certain extent, prevents it from producing harmful or dangerous outcomes. Time magazine stated that Kenyan workers were exposed to the "most harmful content you could think of" while construing the API. They worked long periods without protection while they were labelling the content.
What is the environmental impact of the model?
Even if it is unperceivable, the environmental impact of LLMs is significant. Aitken quotes an estimate stating that the amount of CO2 generated by training ChatGPT is the equivalent of driving a car to the moon and back. That’s just training it, not including its refinement, maintenance or operation. She encourages us to consider whether what we are using it for is worth it or not.
AI systems will always do what they are programmed to do. We are programming them to do more and more things, with greater sophistication [...] but ChatGPT can’t replace human creativity and problem-solving skills.
Moving towards the end of her keynote, Aitken stated that proportional to the enthusiasm around AI is also the concern that it might be making multiple jobs obsolete and, in some circles, that it might even destroy the world. Her response was "no" for both, as the AI will be bounded by its given context. Even if it will generate more and more impressive content, that is because the data it was trained with was impressive, both in terms of content and quality. AI will never be able to replace individuals' problem-solving, creativity and emotional intelligence. Nevertheless, she stressed the high responsibility of developers to look at the practical and technical limitations of foundation models and to be guided by ethical and social considerations in taking responsible approaches to engage with AI. They are to inspect the foundations before building!