Date of Graduation
Summer 2019
Degree
Master of Natural and Applied Science in Computer Science
Department
Computer Science
Committee Chair
Jamil M. Saquer
Abstract
Online spam reviews are deceptive evaluations of products and services. They are often carried out as a deliberate manipulation strategy to deceive the readers. Recognizing such reviews is an important but challenging problem. In this work, I try to solve this problem by using different data mining techniques. I explore the strength and weakness of those data mining techniques in detecting fake review. I start with different supervised techniques such as Support Vector Ma- chine (SVM), Multinomial Naive Bayes (MNB), and Multilayer Perceptron. The results attest that all the above mentioned supervised techniques can successfully detect fake review with more than 86% accuracy. Then, I work on a semi-supervised technique which reduces the dimension- ality of the input features vector but offers similar performance to existing approaches. I use a combination of topic modeling and SVM for the implementation of the semi-supervised tech- nique. I also compare the results with other approaches that consider all the words of a dataset
as input features. I found that topic words are enough as input features to get similar accuracy compared to other approaches where researchers consider all the words as input features. At the end, I propose an unsupervised learning approach named as Words Basket Analysis for fake re- view detection. I utilize five Amazon products review dataset for an experiment and report the performance of the proposed on these datasets.
Keywords
data mining; deceptive reviews; topic modeling; SVM; opinion spam; words basket analysis
Subject Categories
Other Computer Engineering
Copyright
© Md Forhad Hossain
Recommended Citation
Hossain, Md Forhad, "Fake Review Detection using Data Mining" (2019). MSU Graduate Theses. 3423.
https://bearworks.missouristate.edu/theses/3423