{"title":"意见挖掘与主动学习:抽样策略的比较","authors":"Douglas Vitório, E. Souza, Adriano Oliveira","doi":"10.5753/brasnam.2019.6549","DOIUrl":null,"url":null,"abstract":"There are two main problems when performing Opinion Mining (OM) with data streams: the lack of labeled data and the need to update the learning model. The most used OM techniques cannot deal well with these challenges, so, an alternative is to use semi-supervised methods, such as the Active Learning, which is a method to label only selected data rather than the entire data set; however, it requires the choice of a sampling strategy to select the data to be labeled. In this paper, we evaluated eight strategies in ten data sets, in order to identify the best ones for OM with Twitter streams. According to our experiments, the Entropy strategy showed the best results, but it selects a large number of instances to be labeled, requiring further investigation.","PeriodicalId":428504,"journal":{"name":"Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Opinion Mining and Active Learning: a Comparison of Sampling Strategies\",\"authors\":\"Douglas Vitório, E. Souza, Adriano Oliveira\",\"doi\":\"10.5753/brasnam.2019.6549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are two main problems when performing Opinion Mining (OM) with data streams: the lack of labeled data and the need to update the learning model. The most used OM techniques cannot deal well with these challenges, so, an alternative is to use semi-supervised methods, such as the Active Learning, which is a method to label only selected data rather than the entire data set; however, it requires the choice of a sampling strategy to select the data to be labeled. In this paper, we evaluated eight strategies in ten data sets, in order to identify the best ones for OM with Twitter streams. According to our experiments, the Entropy strategy showed the best results, but it selects a large number of instances to be labeled, requiring further investigation.\",\"PeriodicalId\":428504,\"journal\":{\"name\":\"Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/brasnam.2019.6549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do Brazilian Workshop on Social Network Analysis and Mining (BraSNAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/brasnam.2019.6549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Opinion Mining and Active Learning: a Comparison of Sampling Strategies
There are two main problems when performing Opinion Mining (OM) with data streams: the lack of labeled data and the need to update the learning model. The most used OM techniques cannot deal well with these challenges, so, an alternative is to use semi-supervised methods, such as the Active Learning, which is a method to label only selected data rather than the entire data set; however, it requires the choice of a sampling strategy to select the data to be labeled. In this paper, we evaluated eight strategies in ten data sets, in order to identify the best ones for OM with Twitter streams. According to our experiments, the Entropy strategy showed the best results, but it selects a large number of instances to be labeled, requiring further investigation.