Netflix has developed a fraud and abuse detection framework for streaming services, based on artificial intelligence models and data-driven anomaly detections trained on the behavior of the users. Streaming services have, potentially, a lot of onboarded users on multiple devices. The attack surface for these kinds of services could be very wide, this is why a machine-learning approach could be useful to help secure these services.
There are two kinds of anomaly detection approaches: rule-based and model-based. Rule-based use a set of rules specified by the domain experts. Domain experts define the anomaly and define a set of rules to discover these incidents. The deployment and use of these anomaly detection methods could be expensive and time consuming at scale, and is not implementable for real-time analysis. On the other hand, model-based anomaly detection is based on the development of models built and used to detect abnormal behaviors in some automatic way. This approach is more scalable and implementable for real-time analysis. These models highly rely on the availability of context specific data (and labeled).
The fraud framework developed is based on semi-supervised models and supervised models. Data labeling is an important stage for the model development process but there were no already labeled datasets to train the models for this specific domain. So a set of rule-based heuristics are defined, based on the experience of security experts, to identify anomaly behaviors of clients and label them to create a dataset. At the label stage, the Synthetic Minority Over-sampling Technique (SMOTE) is used to avoid the problems tied to label imbalanced datasets.
The fraud categories considered in the development of this framework are content fraud, service fraud, and account fraud. With the heuristics defined, the dataset created contains three main subsets of labels: rapid license acquisition, too many failed attempts at streaming, and unusual combinations of device type and DRM.
Over 30 days 1,030,005 benign accounts are gathered and 28,045 are anomalous. These anomalous accounts have been true with the heuristics described previously. 31% of anomalous accounts are tagged as content fraud, 47% as service fraud, and 21% as account fraud. The 28,045 anomalous accounts, based on the heuristic functions, are considered as incidents of one, two, and three fraud categories, respectively 85%, 12%, and 3%.
As shown in the correlation matrix for the 23 data features considered to develop this framework, there are positive correlations between features that correspond to device signatures and between features that refer to title acquisition activities.
The list of streaming related features
Correlation matrix of the features considered for (a) clean and (b) anomalous data samples
The evaluation metrics used to measure the performance of the models studied, for the one class and binary anomaly detection task are accuracy, precision, recall, f0.5, f1, f2, and ROC AUC. For the multi-class multi-label task, the metrics are: accuracy precision, recall f0.5, f1,f2, EMR Hamming loss, and Hamming scores.
The table below shows the evaluation metrics for the semi-supervised anomaly directions methods considered. The deep auto-encoder performs the best.
For the supervised binary anomaly detections, the metrics are:
and for the supervised multi-class and multi-label anomaly detection models, the evaluation metrics are:
From the features point of view: count distinct encoding formats, the count of distinct devices, and the count of distinct DRMs are the most important features for content fraud detection. For service fraud detection, the most important features are the count of content licenses associated with an account, the count of distinct devices, and the percentage use of type devices by an account. Finally, the count of distinct devices is dominant to detect account fraud behavior.