BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Teaching Machines to Understand Emotions with Sentiment Analysis

Teaching Machines to Understand Emotions with Sentiment Analysis

Leia em Português

This item in japanese

Sentiment analysis teaches computers to recognise the human emotions present in text. The fundamental trade-off in sentiment analysis is between simplicity and accuracy. Approaches vary from using a list of words associated with emotions, to deep learning with techniques like word embeddings, neural networks and attention mechanisms.

Donagh Horgan, lead data scientist at Johnson Controls, will give a talk about building a sentiment analyser for Twitter at RebelCon.io 2019. The conference will be held on June 19-20 in Cork, Ireland, and RebelCon aims to bring together the Cork Software Engineering community over two days of workshops and talks on the "latest technology, culture and development practices in the software industry". In his talk, Horgan will show how to iteratively build an AI-powered pipeline for Twitter sentiment analysis.

Over the past few years, sentiment analysis has been used in an increasing number of applications: it’s used in customer support to sort incoming issues by the polarity of the customer’s perceived emotions, in market research and opinion polling to understand how people think about certain topics, and to detect abuse, bullying and flaming in forums and discussion boards online. Johnson Controls uses sentiment analysis to detect threats of violence to better protect customers and their employees.

According to Horgan, there are huge opportunities for sentiment analysis in life safety. There have been several mass shooting incidents over the past year where the suspects have posted warnings on public or semi-public forums online in advance. He sees opportunities to potentially save lives by getting machines to analyse this kind of content; the costs for such analysis would be very low.

InfoQ interviewed Donagh Horgan about how we can use sentiment analysis for teaching machines to understand emotions.

InfoQ: What is sentiment analysis?

Donagh Horgan: Sentiment analysis is an area of artificial intelligence concerned with teaching computers to recognise the human emotions present in text. The goal of sentiment analysis is to understand the variety and strength of emotions in written text.

Typically, this is an easy task for humans to do. For example, if I tell you "this movie is great", you’ll know without even thinking too much about it that I’m (1) positive and (2) excited about (3) a film. But it’s harder for machines to learn how to do this because languages are not straightforward. For instance, I can say, "This movie is better than that one," but not "This movie is gooder than that one", and that knowledge comes naturally to me.

But computers work via algorithms and it would be very difficult to write down a precise and maintainable set of rules for understanding all of the exceptions present in the English language. Even worse, there are lots of different languages. It's a tough problem.

InfoQ: How does sentiment analysis work?

Horgan: There are a few different approaches to it, but the fundamental trade-off is between simplicity and accuracy. One simple way is to make a list of words that are associated with each emotion you want to track. For example, you could make a list of positive words (good, great, excellent) and a list of negative words (poor, terrible, awful). You then take the piece of text you’re interested in and keep a tally of all the positive and negative words that appear in it. If the total is positive, then you can conclude that the text is positive; otherwise, you can conclude that it is negative.

You can make the process more accurate by giving each word a "strength". For example, "good" might be 60% positive, but "great" sounds even more positive so you might give that an 80% score. It’s not as laborious as it sounds - there are free and open access emotion lists available online (for example, in the Python library pattern), so you often don’t have to make your own.

But simple word tallies often lack the nuance to be able to grasp more technical linguistic constructs. For example, I might tell you, "This movie is awfully good", which might generate a neutral or even negative score, depending on how you consider the word "awfully". While there are ways to work around this with word tallies, in general they are not sophisticated enough to fully grasp the problem.

One solution to this is the use of machine learning algorithms, where the computer learns the probability of individual words or pairs of words or even longer bits of text being associated with a given emotion. I won’t get into the maths of it here, but it’s not difficult to build a reasonable Naive Bayes model (or at least one that is better than word tallying).

InfoQ: How reliable is sentiment analysis? Can we trust the outcomes?

Horgan: Unfortunately, all currently known approaches suffer from a lack of context of one form or another. For instance, tallies place too much emphasis on individual words, so they can’t understand phrases like "great failure". You can get around this with the Naive Bayes algorithm because it can learn to associate emotions with pairs, triplets or phrases of any length, but it might still trip up on "I can’t get no satisfaction" if you haven’t shown it enough examples of that specific double negative.

The state of the art right now is deep learning, which uses techniques like word embeddings, neural networks and attention mechanisms to learn the complex structures present in human language. But even these models can get confused with more complex linguistic constructs, like irony and sarcasm. In fact, even amongst humans, the level of agreement on exactly what emotions are present rarely exceeds 80%. I think the lesson here is that language and emotion are bound together, but the relationship is not always clear - even to us. For now, machines can only do as well as we can.

InfoQ: What have you learned from building a sentiment analyser for Twitter?

Horgan: Quite a bit actually. It’s both easier and more difficult than it sounds. You can try off the shelf methods, and they will work okay, but you have to get a bit more specific if you want to build something really accurate. For instance, you could take into account the profile of the person who is tweeting. Donald Trump is a great example of someone who is only either highly positive or highly negative. Incorporating that kind of information into the algorithm about individual users could yield even better results.

InfoQ: If InfoQ readers want to learn more about sentiment analysis, where can they go?

Horgan: Towards Data Science has lots of tutorials (some more advanced than others), which are usually aimed at hackers who just want to get started quickly. Natural Language Processing with Python is a good starting point for the area in general, although it’s not free. There are lots of useful Python libraries with good documentation and tutorials out there too, such as nltk, spacy, textblob, vader and gensim.

InfoQ is covering RebelCon.io 2019 with articles, Q&As, and summaries. Previously InfoQ published the article A Different Meaning of CI - Continuous Improvement, the Heartbeat of DevOps by Sabine Wojcieszak.

Rate this Article

Adoption
Style

BT