Identifying High-Quality User Replies Using Deep Neural Networks

2021 7th International Conference on Web Research (ICWR) Pub Date : 2021-05-19 DOI:10.1109/ICWR51868.2021.9443143

Masoumeh Rajabi, Mohammad Ehsan Basiri, Shahla Nemati

{"title":"Identifying High-Quality User Replies Using Deep Neural Networks","authors":"Masoumeh Rajabi, Mohammad Ehsan Basiri, Shahla Nemati","doi":"10.1109/ICWR51868.2021.9443143","DOIUrl":null,"url":null,"abstract":"With the significant expansion of Q&A forums and the increasing need for users to access useful information, identifying quality content in text forums is of particular importance. Previous studies have focused on extracting several types of quality features from text that may be a time and labor-intensive task. To address this problem, in this paper, a long short-term memory (LSTM) deep neural network model is proposed to determine high-quality responses of users in text forums using only raw text of user replies. In the proposed model, embeddings from language models (ELMo) are usesd to represent words in vectors or embeddings. The proposed model is evaluated on two datasets: The TripAdvisor for New York City (NYC) and the Ubuntu Linux distribution online forums. Comparison of the results obtained using the proposed model and support vector machines (SVM), linear regression (LR), artificial neural networks (ANN), and naïve Bayes (NB) algorithms showed that, using only textual features, the accuracy of the proposed model was 43% and 28% higher compared to the highest accuracy obtained by the four traditional machine learning (ML) algorithms on the NYC and the Ubuntu datasets, respectively. This improvement was about 17% and 16% compared to the best results obtained by ML algorithms using both textual and quality dimension features.","PeriodicalId":377597,"journal":{"name":"2021 7th International Conference on Web Research (ICWR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR51868.2021.9443143","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

With the significant expansion of Q&A forums and the increasing need for users to access useful information, identifying quality content in text forums is of particular importance. Previous studies have focused on extracting several types of quality features from text that may be a time and labor-intensive task. To address this problem, in this paper, a long short-term memory (LSTM) deep neural network model is proposed to determine high-quality responses of users in text forums using only raw text of user replies. In the proposed model, embeddings from language models (ELMo) are usesd to represent words in vectors or embeddings. The proposed model is evaluated on two datasets: The TripAdvisor for New York City (NYC) and the Ubuntu Linux distribution online forums. Comparison of the results obtained using the proposed model and support vector machines (SVM), linear regression (LR), artificial neural networks (ANN), and naïve Bayes (NB) algorithms showed that, using only textual features, the accuracy of the proposed model was 43% and 28% higher compared to the highest accuracy obtained by the four traditional machine learning (ML) algorithms on the NYC and the Ubuntu datasets, respectively. This improvement was about 17% and 16% compared to the best results obtained by ML algorithms using both textual and quality dimension features.

查看原文本刊更多论文

使用深度神经网络识别高质量的用户回复

随着问答论坛的显著扩展和用户获取有用信息的需求日益增加，识别文本论坛中的优质内容尤为重要。以前的研究集中于从文本中提取几种类型的质量特征，这可能是一项耗时费力的任务。为了解决这一问题，本文提出了一种长短期记忆(LSTM)深度神经网络模型，仅使用用户回复的原始文本来确定文本论坛中用户的高质量回复。在该模型中，使用语言模型的嵌入(ELMo)来表示向量或嵌入中的单词。提出的模型在两个数据集上进行了评估:纽约市(NYC)的TripAdvisor和Ubuntu Linux发行版在线论坛。将所提出的模型与支持向量机(SVM)、线性回归(LR)、人工神经网络(ANN)和naïve贝叶斯(NB)算法的结果进行比较，结果表明，仅使用文本特征时，所提出模型的准确率比在NYC和Ubuntu数据集上使用四种传统机器学习(ML)算法获得的最高准确率分别高出43%和28%。与使用文本和质量维度特征的ML算法获得的最佳结果相比，这种改进约为17%和16%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 7th International Conference on Web Research (ICWR)

自引率

0.00%

发文量