Belief Propagation for Detecting Shilling Attacks

Recommender systems are increasingly employed by e-commerce websites, e.g., Amazon.com and Netflix.com, to provide personalized recommendations to users. They are efficient in retrieving information that meets user interests from a large volume of data, which is critical in today's online activities as a human being simply cannot handle the explosive amount of information on the Internet. Collaborative Filtering (CF) is so far the most popular recommendation algorithm, which relies on historic ratings given by users on items to make recommendations. Unfortunately, it is vulnerable to the so called ``shilling'' attacks, in which a group of spam users collaborate to influence the recommendations for their benefits, e.g., to recommend their products more often. In an attacked recommender system, users are recommended with flawed low quality products or products they do not like, which in turn decreases user satisfaction. Therefore, it is of practical interest and importance to protect the recommender systems against those shilling attacks.

To protect the recommender systems against such attacks, one of the major approaches is to detect the spam users and remove them from the system. The existing works introduced several metrics for detecting the rating patterns of spam users. However, those feature-based algorithms suffer from low accuracy, as they only look at individual user rating patterns. Another work exploited the statistical properties of spam users, e.g., covariance, to perform detection via variable selection using Principle Component Analysis (PCA-VarSel). PCA-VarSel is very effective when spam users have low covariance, e.g., when they rate items randomly selected from all items, because genuine users have high covariance since they mostly rate only the popular items. However, that PCA-VarSel easily fails if spam users also selectively rate only those popular items.

The existing detection algorithms in the literature focused extensively on rating patterns of individual spam users, such as those feature-based algorithms, but they do not take into account the relationships of users, and suffer from low accuracy. In our work, we develop a probabilistic inference framework that further exploits the user relationships for attack detection. In particular, we propose a factorized probabilistic model, and apply the efficient Belief Propagation (BP) algorithm for inference