In a recent ThoughtWorks blog post, Angelica Perez shared information about a new open source project for an interactive film experience. The project is called EmoPy and focuses on Facial Expression Recognition (FER) by providing a toolkit that allows developers to accurately predict emotions based upon images passed to the service.
Perez defines FER as "an image classification problem located within the wider field of computer vision." Computer vision is a hot topic, garnering investment from many large cloud providers to democratize access to these machine learning models through public APIs. The challenge though is the models and algorithms behind these services are not made publicly available and accessing high quality datasets is difficult. Pereze explains how EmoPy is different:
Our aim is to widen public access to this crucial emerging technology, one for which the development usually takes place behind commercially closed doors. We welcome raised issues and contributions from the open source development community and hope you find EmoPy useful for your projects.
Having access to a FER training model is very important and a standard set of emotion classifications are often used, including:
- Anger
- Disgust
- Fear
- Happiness
- Sadness
- Surprise
- Neutral
Image source: https://www.thoughtworks.com/insights/blog/recognizing-human-facial-expressions-machine-learning
The EmoPY toolkit was created as part of a ThoughtWorks Arts program which incubates artists working on social and technology projects. The ThoughtWorks team supported artist-in-residence Karen Palmer to create an interactive film experience called RIOT.
RIOT places viewers in front of a screen where a contentious video is shown to them. These video clips are based upon riot situations which include looters and riot police. Viewer's facial expressions are recorded and analyzed using a webcam, which is loaded into EmoPy.
Image source: https://www.thoughtworks.com/insights/blog/emopy-machine-learning-toolkit-emotional-expression
EmoPy was built from the scratch and inspired by the research of Dr. Hongying Meng. The core requirements of EmoPy include the following:
- Neural Network Architectures include layers which feed outputs to each other in sequence. The performance of these architectures is highly depended upon the choice and sequence of layers that make up the Neural Network Architecture.
- Selecting datasets is really important as the larger the image library, the higher the accuracy and generalizability of the models. Today, there are not many public data sets that are available. EmoPy was able to take advantage of Microsoft's FER2013 and the Extended Cohn-Kanade datasets. The FER2013 dataset includes over 35,000 facial expressions for seven emotion categories that include anger, disgust, fear, happiness, sadness, surprise and calm. The Cohn-Kanade dataset includes facial expression sequences rather than still images which represent a transition between these facial expressions. The Cohn-Kanade dataset contains 327 sequences.
Image Source: https://www.thoughtworks.com/insights/blog/emopy-machine-learning-toolkit-emotional-expression
- The training process is the next consideration the ThoughtWorks team addressed. The process includes the training of the neural networks and selected datasets. The dataset was split into two parts: a training set and a validation set. The process then included:
- Images from the training set were used to train the neural network where an emotion prediction is assessed based upon weighting and parameters.
- The neural network would then compare the predicted emotion against the true emotion and calculate a loss value.
- The loss value would then be used to adjust the weight of the neural network. Iterating over this process allowed the prediction model to become more intelligent and accurate.
- The validation set is then used to test the neural network after it has been trained. It was very important for the Thoughtworks team to have two different datasets. By using a different set of images from the training set, they were able to evaluate the model more objectively. Using this approach also prevented "overfitting" which is "when a neural network learns patterns from the training samples so well that it is unable to generalize when given new samples." When overfitting occurs, the training set accuracy is much higher than the validation set.
- Measuring performance was the final requirement for EmoPy. The question that the ThoughtWorks team sought to answer was how accurate are given architectures when predicting emotions based on the training set and the validation set? Within the results, the ConvolutionINN model performed the best. In the case of emotion sets such as disgust, happiness and surprise, the neural network was able to correctly predict images it had never seen before 9 out of 10 times. While the accuracy for disgust, happiness and surprise is high, it isn't always the case for other emotions. Misclassifications are possible, especially in the case of fear. The best way to deal with these misclassifications is use the largest dataset possible.
The EmoPy project is actively looking for contributors. Whether you want to contribute to the project or just use it, the project team has chosen to use an unrestrictive license to make it available to the broadest audience possible.