通过lstm预测用户贡献的质量

Proceedings of the 12th International Symposium on Open Collaboration Pub Date : 2016-08-17 DOI:10.1145/2957792.2957811

Rakshit Agrawal, L. D. Alfaro

{"title":"通过lstm预测用户贡献的质量","authors":"Rakshit Agrawal, L. D. Alfaro","doi":"10.1145/2957792.2957811","DOIUrl":null,"url":null,"abstract":"In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user's history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features. We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input. In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable-length history of a user's activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminate the process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user's past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution. We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user's past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality. We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions, and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user's reputation and other information.","PeriodicalId":297748,"journal":{"name":"Proceedings of the 12th International Symposium on Open Collaboration","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Predicting the quality of user contributions via LSTMs\",\"authors\":\"Rakshit Agrawal, L. D. Alfaro\",\"doi\":\"10.1145/2957792.2957811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user's history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features. We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input. In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable-length history of a user's activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminate the process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user's past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution. We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user's past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality. We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions, and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user's reputation and other information.\",\"PeriodicalId\":297748,\"journal\":{\"name\":\"Proceedings of the 12th International Symposium on Open Collaboration\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th International Symposium on Open Collaboration\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2957792.2957811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Symposium on Open Collaboration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2957792.2957811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

在许多协作系统中，自动评估新贡献的质量是有用的;例如，评估可以用于标记供审查的贡献。为了预测用户贡献的质量，考虑修订本身的特征和该用户过去贡献的历史是有用的。在一些方法中，首先将用户的历史总结为许多特性，例如贡献的数量、用户声誉、上一次修订的时间等等。然后将这些特征与当前版本的特征一起传递给机器学习分类器，该分类器输出对用户贡献的预测。使用总结步骤是因为通常的机器学习模型，如神经网络，支持向量机等依赖于固定数量的输入特征。我们在本文中表明，可以通过采用能够处理输入时间序列的机器学习方法来避免这种手动选择摘要特征。特别是，我们证明了长短期记忆(LSTM)神经网络能够直接处理系统中用户活动的可变长度历史，并产生高度预测用户下一个贡献质量的输出。我们的方法并没有消除特征选择的过程，这是所有机器学习中都存在的。相反，它消除了决定用户过去的哪些特征对预测未来最有用的需要:我们可以简单地将过去的所有特征传递给机器学习设备，并让它对下一个贡献的质量进行估计。我们提出了LSTM和神经网络相结合的模型来预测修正质量，并表明所获得的预测精度远远优于单独使用神经网络获得的预测精度。更有趣的是，我们还表明，所获得的预测优于使用用户声誉作为总结用户过去工作质量的特征所获得的预测。这可以通过注意到用户声誉的主要功能来解释，用户声誉的主要功能是提供一种激励，以提供有用的贡献，而不是作为预测未来贡献质量的优化功能。我们还表明，LSTM输出以一种自然的方式响应用户行为而变化，当用户执行一系列高质量的贡献时增加，当用户执行一系列低质量的工作时减少。因此，用户的LSTM输出可以与用户的声誉和其他信息一起有效地显示给其他用户。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Predicting the quality of user contributions via LSTMs

In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user's history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features. We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input. In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable-length history of a user's activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminate the process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user's past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution. We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user's past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality. We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions, and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user's reputation and other information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 12th International Symposium on Open Collaboration

自引率

0.00%

发文量