{"title":"Optimizing XCSR for Text Classification","authors":"M. H. Arif, Jianxin Li, Muhammad Iqbal, Hao Peng","doi":"10.1109/SOSE.2017.9","DOIUrl":null,"url":null,"abstract":"XCS, an evolutionary computing technique, can classify data using both bit strings and real valued representations. \"Real valued XCS\" (XCSR) commonly uses the min max interval based representation (MMR) for continuous valued data sets. Text data sets can be represented using bag of words based real valued representation, e.g. term frequency inverse document frequency of features. In this work we classify social media short informal text messages using XCSR, for the first time, from two major domains, i.e. spam detection and sentiment analysis. We perform spam detection of SMS and Email messages, and sentiment analysis of reviews and tweets. Feature vectors extracted from short text messages are very sparse and XCSR with MMR representation can not handle sparse data sets very well. We proposed XCSR# that uses MMR representation with explicit \"don't care\" intervals to handle sparse social media data sets. The experimental results indicate that introduction of the explicit \"don't care\" intervals improved the performance and created a statistically significant impact, specifically in the spam detection data sets. Further, it is observed that XCSR# produced more accurate and general rules than XCSR.","PeriodicalId":312672,"journal":{"name":"2017 IEEE Symposium on Service-Oriented System Engineering (SOSE)","volume":"381 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Service-Oriented System Engineering (SOSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOSE.2017.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
XCS, an evolutionary computing technique, can classify data using both bit strings and real valued representations. "Real valued XCS" (XCSR) commonly uses the min max interval based representation (MMR) for continuous valued data sets. Text data sets can be represented using bag of words based real valued representation, e.g. term frequency inverse document frequency of features. In this work we classify social media short informal text messages using XCSR, for the first time, from two major domains, i.e. spam detection and sentiment analysis. We perform spam detection of SMS and Email messages, and sentiment analysis of reviews and tweets. Feature vectors extracted from short text messages are very sparse and XCSR with MMR representation can not handle sparse data sets very well. We proposed XCSR# that uses MMR representation with explicit "don't care" intervals to handle sparse social media data sets. The experimental results indicate that introduction of the explicit "don't care" intervals improved the performance and created a statistically significant impact, specifically in the spam detection data sets. Further, it is observed that XCSR# produced more accurate and general rules than XCSR.