求助PDF
{"title":"集成特征和实例选择技术的意见挖掘","authors":"Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo","doi":"10.4018/ijdwm.2020070109","DOIUrl":null,"url":null,"abstract":"Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification","PeriodicalId":54963,"journal":{"name":"International Journal of Data Warehousing and Mining","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Integrating Feature and Instance Selection Techniques in Opinion Mining\",\"authors\":\"Zi-Hung You, Ya-Han Hu, Chih-Fong Tsai, Yen-Ming Kuo\",\"doi\":\"10.4018/ijdwm.2020070109\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification\",\"PeriodicalId\":54963,\"journal\":{\"name\":\"International Journal of Data Warehousing and Mining\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Warehousing and Mining\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.4018/ijdwm.2020070109\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Warehousing and Mining","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.4018/ijdwm.2020070109","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 4
引用
批量引用
Integrating Feature and Instance Selection Techniques in Opinion Mining
Opinion mining focuses on extracting polarity information from texts. For textual term representation,differentfeatureselectionmethods,e.g.termfrequency(TF)ortermfrequency– inverse document frequency (TF–IDF), can yield diverse numbers of text features. In text classification,however,aselectedtrainingsetmaycontainnoisydocuments(oroutliers),which candegrade theclassificationperformance.Tosolve thisproblem, instanceselectioncanbe adoptedtofilteroutunrepresentativetrainingdocuments.Therefore,thisarticleinvestigatesthe opinionminingperformanceassociatedwithfeatureandinstanceselectionstepssimultaneously. Two combination processes based on performing feature selection and instance selection in differentorders,werecompared.Specifically, twofeatureselectionmethods,namelyTFand TF–IDF, and two instance selection methods, namely DROP3 and IB3, were employed for comparison. The experimental results by using three Twitter datasets to develop sentiment classifiersshowedthatTF–IDFfollowedbyDROP3performsthebest. KeyWORDS Feature Selection, Instance Selection, Opinion Mining, Text Classification