{"title":"半监督学习应用于App Store分析的初步研究","authors":"Roger Deocadez, R. Harrison, Daniel Rodríguez","doi":"10.1145/3084226.3084285","DOIUrl":null,"url":null,"abstract":"Semi-Supervised Learning (SSL) is a data mining technique which comes between supervised and unsupervised techniques, and is useful when a small number of instances in a dataset are labelled but a lot of unlabelled data is also available. This is the case with user reviews in application stores such as the Apple App Store or Google Play, where a vast amount of reviews are available but classifying them into categories such as bug related review or feature request is expensive or at least labor intensive. SSL techniques are well-suited to this problem as classifying reviews not only takes time and effort, but may also be unnecessary. In this work, we analyse SSL techniques to show their viability and their capabilities in a dataset of reviews collected from the App Store for both transductive (predicting existing instance labels during training) and inductive (predicting labels on unseen future data) performance.","PeriodicalId":192290,"journal":{"name":"Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis\",\"authors\":\"Roger Deocadez, R. Harrison, Daniel Rodríguez\",\"doi\":\"10.1145/3084226.3084285\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-Supervised Learning (SSL) is a data mining technique which comes between supervised and unsupervised techniques, and is useful when a small number of instances in a dataset are labelled but a lot of unlabelled data is also available. This is the case with user reviews in application stores such as the Apple App Store or Google Play, where a vast amount of reviews are available but classifying them into categories such as bug related review or feature request is expensive or at least labor intensive. SSL techniques are well-suited to this problem as classifying reviews not only takes time and effort, but may also be unnecessary. In this work, we analyse SSL techniques to show their viability and their capabilities in a dataset of reviews collected from the App Store for both transductive (predicting existing instance labels during training) and inductive (predicting labels on unseen future data) performance.\",\"PeriodicalId\":192290,\"journal\":{\"name\":\"Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3084226.3084285\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3084226.3084285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Preliminary Study on Applying Semi-Supervised Learning to App Store Analysis
Semi-Supervised Learning (SSL) is a data mining technique which comes between supervised and unsupervised techniques, and is useful when a small number of instances in a dataset are labelled but a lot of unlabelled data is also available. This is the case with user reviews in application stores such as the Apple App Store or Google Play, where a vast amount of reviews are available but classifying them into categories such as bug related review or feature request is expensive or at least labor intensive. SSL techniques are well-suited to this problem as classifying reviews not only takes time and effort, but may also be unnecessary. In this work, we analyse SSL techniques to show their viability and their capabilities in a dataset of reviews collected from the App Store for both transductive (predicting existing instance labels during training) and inductive (predicting labels on unseen future data) performance.