Online Review Content Moderation Using Natural Language Processing and Machine Learning Methods : 2021 Systems and Information Engineering Design Symposium (SIEDS)
{"title":"Online Review Content Moderation Using Natural Language Processing and Machine Learning Methods : 2021 Systems and Information Engineering Design Symposium (SIEDS)","authors":"Alicia Doan, Nathan England, Travis Vitello","doi":"10.1109/SIEDS52267.2021.9483739","DOIUrl":null,"url":null,"abstract":"With the ubiquity of Internet-based words-of-mouth to inform decisions on various products and services, people have become reliant on the authenticity of website reviews. These reviews may be manually evaluated for publishability onto a website, however increasing volumes of user-submitted content may strain a website’s resources for accurate content moderation. Recognizing the important for patients to receive authentic reviews of cosmetic surgery procedures, we considered a corpus of 523,564 user-submitted reviews to the RealSelf.com website spanning the dates of 2018-01-01 through 2020-05-31. Prior binary classifications of \"published\" or \"unpublished\" were applied to these reviews by the RealSelf content moderation team. Textual and behavioral machine learning models were developed in this study to predict the classification of RealSelf’s reviews. An ensemble model, constructed from the top-performing textual and behavioral models in this study, was found to have a classification accuracy of 82.9 percent.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
With the ubiquity of Internet-based words-of-mouth to inform decisions on various products and services, people have become reliant on the authenticity of website reviews. These reviews may be manually evaluated for publishability onto a website, however increasing volumes of user-submitted content may strain a website’s resources for accurate content moderation. Recognizing the important for patients to receive authentic reviews of cosmetic surgery procedures, we considered a corpus of 523,564 user-submitted reviews to the RealSelf.com website spanning the dates of 2018-01-01 through 2020-05-31. Prior binary classifications of "published" or "unpublished" were applied to these reviews by the RealSelf content moderation team. Textual and behavioral machine learning models were developed in this study to predict the classification of RealSelf’s reviews. An ensemble model, constructed from the top-performing textual and behavioral models in this study, was found to have a classification accuracy of 82.9 percent.