Variations of Neighbor Diversity for Fraudster Detection in Online Auction

Inflated reputation fraud is a serious problem in online auction. Recently, the neighbor diversity based on Shannon entropy has been proposed as an effective feature to discern fraudsters from normal users. In the literature, there exist many different methods to quantify diversity. This raises the problem of finding the most suitable method to calculate neighbor diversity for fraudster detection. In this study, we collect four different methods of quantifying diversity, and apply them to calculate neighbor diversity. We then use these various neighbor diversities for fraudster detection. Our experimental results against a dataset collected from a real world auction website show that, although these diversities are calculated differently, their performances on fraudster detection are similar.


Introduction
Online shopping/auction websites have gained increasing popularity for the past few years.This lucrative business opportunity has drawn not only the legitimate sellers to conduct their business online but also the fraudsters to commit fraudulent transactions.As a result, online shopping/auction websites often provide a reputation system to help their users to distinguish legitimate sellers from fraudsters.The reputation system requests the buyer and the seller of a transaction to give each other a rating.Then, the reputation system calculates a reputation score of a user based on all the ratings the user received in his/her previous transactions.Intuitively, users with higher reputation scores are more trustworthy, and consequently are more likely to attract sales.
Because the reputation score of a user is based on all the ratings the user received in the past, a legitimate user requires time and effort to accumulate good ratings from other users.In contrast, a fraudster often commits the so-called "inflated reputation fraud" [1] to accumulate good ratings quickly, and cheats the reputation system into giving him/her a high reputation score .The inflated reputation fraud is accomplished by a group of collusive users who conduct many fake transactions for low-price merchandises and give each other good ratings.Because the reputation score is crucial for evaluating the trustworthiness of a user, detecting the inflated reputation fraud has become a key task for online shopping/auction websites.
In the literature, many methods had been proposed to detect fraudsters with inflated reputation in online auctions.Some of them adopted the concept of network graph to detect fraudsters who rely on their collaborators to boost up their reputations [1][2][3][4][5].With this concept, social network analysis (SNA) has been found as an effective tool to detect fraudsters and their cohesive groups [1,4,5].In our recent work [6], we proposed the concept of neighbor diversity to detect inflated reputation fraud.The neighbor diversity of a user quantifies the diversity of all traders that have transactions with the user.We showed that the neighbor diversity on the number of received ratings outperformed previous works that use k-core and/or center weight [1,4,5].
In [6], Shannon entropy [7] was adopted to quantify the neighbor diversity.However, different ways to define and calculate diversity exist in the literature.This motivates the idea of using various diversity definitions to calculate neighbor diversity for fraudster detection.Specifically, we adopt the four different definitions of diversity from Lin [8] to calculate the neighbor diversity.Our experimental results show, although these diversities are calculated differently, their performances on fraudster detection are similar.
The remaining of this paper is organized as follows.Section 2 reviews previous works on fraudster detection.Section 3 applies various definitions of diversity to calculate neighbor diversity.Section 4 describes the experimental settings, and Section 5 presents the experimental results.Finally, Section 6 concludes this paper.

Related Work
Detecting fraudsters with inflated reputation is a critical issue for online shopping/auction websites.Many approaches have been proposed in the literature.Some earlier approaches used the properties derived from the transaction history [2,9], e.g.sum, average, and standard deviation of buying or selling price of merchandises in a period of time.Most of the recent approaches used SNA to detect group of fraudsters [1][2][3][4][5][10][11][12][13][14][15].
Fraudsters who want to increase their reputation scores quickly often have many transactions with the members in their collusive group.Consequently, many approaches applied SNA to detect fraudsters by searching for the cohesive groups in the transaction network.In the SNA literature, characteristics such as k-plex, clique, betweenness, and k-core are often used to detect cohesive groups.Among them, k-core has been found to be the most effective for detecting fraudsters [1,5].
To calculate k-core, a transaction network is first created from the transaction history.In the network, each node represents a user account, and each edge connecting two nodes represents a transaction between two users.Then, SNA is applied to discover k-core components.Although fraudsters frequently usually appear in k-core with k ≥ 2 [1], using k-core alone results in low precision [4].Alternatively, applying both center weight (CW) and k-core improves the precision, but the recall is reduced [4].
The concept of neighbor diversity was proposed to improve both precision and recall [6].As mentioned before, fraudsters mostly do businesses with their collaborators to boost up their reputation.Consequently, their collaborators may share some similar characteristics, and the neighbor diversity of a fraudster's neighbors on those characteristics is likely to be small.Based on this notion, Lin and Khomnotai [6] showed that the neighbor diversity on the number of received ratings provides an effective way to discern fraudsters from normal users.

Variants of Neighbor Diversity
In this study, we use the number of received ratings as the target attribute to quantify the neighbor diversity because this attribute achieves the best performance in our previous work [6].Specifically, the number of received ratings is first calculated for each user.Let x denote a user.The neighbors of x are the users who gave at least one rating to x.The neighbors of x are partitioned into several classes based on the number of received ratings.Let r denote the number of received ratings of a user.If 0 ≤ r < 50, then the user is placed into class 1.If 50 × 2 −2 ≤ r < 50 × 2 −1 , then the user is placed into class i, where i > 1.Let p i (x) denote the proportion of the x's neighbors in the i-th class, and n denote the total number of classes.Then, the following constraints must hold.
0 ≤   () ≤ 1, for  = 1 to �   () Next, we can apply various definitions of diversity to calculate neighbor diversity, as described in the following subsections.

Shannon Entropy Diversity
In [6], Shannon entropy [7] was adopted to calculate the neighbor diversity.The neighbor diversity of x based on Shannon entropy is denoted as   (), and calculated as follows:

Canonical Form of Diversity
The notion of diversity is also widely used in many different areas.For example, in portfolio management, diversity is used to avoid overly concentrated portfolios.Various diversity constraints were proposed, such as weight upper/lower bound constraint [16], L p -norm constraint [17] and entropy constraint [18].Lin [8] proposed a canonical form of these diversity constraints such that the value of diversity is restricted to the same range for all these different definitions of diversity.In this paper, we adopt these canonical forms for calculating neighbor diversity.For problems related to various diversities, please refer to [16][17][18][19].

Max Weight Diversity and Min Weight Diversity
The max weight diversity, denoted as   (), is the maximum of all   () for i=1 to n, as shown below.
() = max The min weight diversity, denoted as   (), is calculated using the minimum of all   () for i=1 to n, as shown below.

Canonical L p -norm Diversity
The Canonical L p -norm diversity, denoted as   (), is similar to the L p -norm except the outer exponent is For the value of pow, the cases of pow = 2 and 3 are commonly used [8].Hence, we consider only  2 () and  3 () in this study.

Canonical Shannon Entropy Diversity
The canonical Shannon entropy diversity, denoted as   (), is the reciprocal of the natural exponential function of Shannon entropy, as shown below.

Experimental Settings
To compare the performance of various neighbor diversities, we collected a dataset from Ruten (www.ruten.com.tw), which is one of the largest online auction websites in Taiwan [14].Similar to the previous works [4][5][6], the dataset grows from a list of suspended users, and then conducts a level-wise expansion to include more users.The dataset consists of 4,407 users, where 1,080 are fraudsters and 3,327 are non-fraudsters (i.e.normal accounts).Notably, This dataset was also used in our previous study [6].
After collecting the dataset, we calculated   (),   (),   (),  2 (),  3 (), and   () for each user x in the dataset, as described in Section 3.Then, we used each of these neighbor diversities to build a classifier to compare their performance on detecting fraudsters.Three classification algorithms (J48 decision tree, Neural Networks (NN), and Support Vector Machine (SVM)) from Weka [20] were used to perform 10-fold cross-validation.

Experimental Results
The experimental results include two parts.Part one uses only one of the neighbor diversities to build classifiers, and the results are shown in Tables 1, 2 and 3 for J48, NN, and SVM, respectively.The best results of each classification algorithm are shown in bold.In Tables 1, 2 and 3, the min weight diversity   performs the worst.The performances of the five diversities (i.e.,   ,   ,  2 ,  3 , and   ) are similar.In spite of its simplicity, the max weight diversity   achieves competitive performance.Because previous works suggest using k-core and CW for fraudster detection [4], part two of the experiment uses both k-core and CW and one of the neighbor diversities to build classifiers, and the results are shown in Tables 4, 5 and 6 for J48, NN, and SVM, respectively.Compared to Part one, the addition of k-core and CW slightly improves the classification performance.The improvement on accuracy is most significant with J48 (between 1.5657% and 2.1103%), and less significant with NN and SVM (between -0.5673% and 1.4522%).

Conclusions
The concept of diversity has been widely used in many domains, e.g., ecology [21][22][23][24] and portfolio management [8,18,25].Various ways to quantify diversity exist in the literature [8,22].In this work, we apply the diversity of the neighbors of each trader for fraudster detection in online auction.Specifically, we use various methods to calculate diversity, and study whether these methods cause significant difference on the classification performance of fraudster detection.Our experimental results show that the diversity   performs the worst.Also, the remaining five diversities (i.e.,   ,   ,  2 ,  3 and   ) achieve similar performance.
The addition of k-core and CW only slightly improves the classification performance of the neighbor diversity (2.1103% in accuracy, at most).Therefore, finding new features to work better with the neighbor diversity for fraudster detection is planned for future work.

Table 2 .
Neural Network performance (Part one)

Table 3 .
Support Vector Machine performance (Part one)

Table 5 .
Neural Network performance (Part two)

Table 6 .
Support Vector Machine performance (Part two)