The growing number of organizations creating and deploying machine learning solutions raises concerns as to their intrinsic security, argues the NCC Group in a recent whitepaper.
The NCC Group's whitepaper provides a classification of attacks that may be carried through against machine learning systems, including examples based on popular libraries such as SciKit-Learn, Keras, PyTorch and TensorFlow platforms.
Although the various mechanisms that allow this are to some extent documented, we contend that the security implications of this behaviour are not well-understood in the broader ML community.
According to the NCC Groups, ML systems are subject to specific forms of attacks in addition to more traditional attacks that may attempt to exploit infrastructure or applications bugs, or other kind of issues.
A first vector of risk is associated to the fact that many ML models contain code that is executed when the model is loaded or when a particular condition is met, such as a given output class is predicted. This means an attacker may craft a model containing malicious code and have it executed to a variety of aims, including leaking sensitive information, installing malware, produce output errors, and so on. Hence:
Downloaded models should be treated in the same way as downloaded code; the supply chain should be verified, the content should be cryptographically signed, and the models should be scanned for malware if possible.
The NCC Group claims to have successfully exploited this kind of vulnerability for many popular libraries, including Python pickle files, SciKit-Learn pickles, PyTorch pickles and state dictionaries, TensorFlow Server and several others.
Another family of attacks are adversarial perturbation attacks, where an attacker may craft an input that causes the ML system to return results of their choice. Several methods for this have been described in literature, such as crafting an input to maximize confidence in any given class or a specific class, or to minimize confidence in any given class. This approach could be used to tamper with authentication systems, content filters, and so on.
The NCC Group's whitepaper also provides a reference implementation of a simple hill climbing algorithm to demonstrate adversarial perturbation by adding noise to the pixels of an image:
We add random noise to the image until confidence increases. We then use the perturbed image as our new base image. When we add noise, we start by adding noise to 5% of the pixels in the image, and decrease that proportion if this was unsuccessful.
Other kinds of well-known attacks include membership inference attacks, which enable telling if an input was part of the model training set; model inversion attacks, which allow attackers to gather sensitive data in the training set; and data poisoning backdoor attacks, which consist in inserting specific items into the training data of a system to cause it to respond in some pre-defined way.
As mentioned, the whitepaper provides a comprehensive taxonomy of machine learning attacks, including possible mitigation, as well as a review of more traditional security issues that were found in many machine learning systems. Make sure to read it to get the full details.