Google uses a clustering algorithm to automatically analyze Android apps and detect which ones can be considered intrusive, write Google security engineers Martin Pelikan, Giles Hogben, and Ulfar Erlingsson.
Intrusive apps are those that require the user to grant a larger set of capabilities than what would be strictly required for their proper functioning. For example, as Google engineers explain, a coloring book app will not usually need access to geolocation data. Other examples of capabilities that not all apps need to do their job are access to personal data, camera, address book, etc. Granting more privileges than strictly necessary is a potentially harmful, since you cannot really know what those data are used for. Among the most frequent cases of harmful app behaviours are: backdoors, spyware, data collection, denial of service, and many more.
The approach that Google follows to detect intrusive apps is based on the concept of functional peer group, i.e. a group of apps that share similar features and that should therefore require a similar set of authorizations. Once those groups are formed, it becomes possible to detect anomalous apps in each group, meaning those apps that require more privileges than similar apps do. This approach requires monitoring the Android Play Store, collecting detailed statistics, and discovering user expectations, so that app groups can be determined automatically. Indeed, according to Google engineers, fixed categorization and manual curation would be a tedious and error-prone task.
To make this approach more effective, Google uses deep learning to discover groups of apps that share similar characteristics using those apps’ metadata, which include textual descriptions and install metrics. Once peer groups are defined, anomaly detection is used inside of each group to identify anomalous apps, i.e. apps that show a mismatch between the privileges they require and their functionality. Anomalous apps are then inspected thoroughly to decide which ones are actually intrusive. That information is used also to determine which apps should be promoted, as well as to get in touch with potentially intrusive apps’ developers and help them improve the privacy and security of their apps.