Optimizing XCSR for Text Classification

M. H. Arif, Jianxin Li, Muhammad Iqbal, Hao Peng
{"title":"Optimizing XCSR for Text Classification","authors":"M. H. Arif, Jianxin Li, Muhammad Iqbal, Hao Peng","doi":"10.1109/SOSE.2017.9","DOIUrl":null,"url":null,"abstract":"XCS, an evolutionary computing technique, can classify data using both bit strings and real valued representations. \"Real valued XCS\" (XCSR) commonly uses the min max interval based representation (MMR) for continuous valued data sets. Text data sets can be represented using bag of words based real valued representation, e.g. term frequency inverse document frequency of features. In this work we classify social media short informal text messages using XCSR, for the first time, from two major domains, i.e. spam detection and sentiment analysis. We perform spam detection of SMS and Email messages, and sentiment analysis of reviews and tweets. Feature vectors extracted from short text messages are very sparse and XCSR with MMR representation can not handle sparse data sets very well. We proposed XCSR# that uses MMR representation with explicit \"don't care\" intervals to handle sparse social media data sets. The experimental results indicate that introduction of the explicit \"don't care\" intervals improved the performance and created a statistically significant impact, specifically in the spam detection data sets. Further, it is observed that XCSR# produced more accurate and general rules than XCSR.","PeriodicalId":312672,"journal":{"name":"2017 IEEE Symposium on Service-Oriented System Engineering (SOSE)","volume":"381 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Symposium on Service-Oriented System Engineering (SOSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SOSE.2017.9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

XCS, an evolutionary computing technique, can classify data using both bit strings and real valued representations. "Real valued XCS" (XCSR) commonly uses the min max interval based representation (MMR) for continuous valued data sets. Text data sets can be represented using bag of words based real valued representation, e.g. term frequency inverse document frequency of features. In this work we classify social media short informal text messages using XCSR, for the first time, from two major domains, i.e. spam detection and sentiment analysis. We perform spam detection of SMS and Email messages, and sentiment analysis of reviews and tweets. Feature vectors extracted from short text messages are very sparse and XCSR with MMR representation can not handle sparse data sets very well. We proposed XCSR# that uses MMR representation with explicit "don't care" intervals to handle sparse social media data sets. The experimental results indicate that introduction of the explicit "don't care" intervals improved the performance and created a statistically significant impact, specifically in the spam detection data sets. Further, it is observed that XCSR# produced more accurate and general rules than XCSR.
为文本分类优化XCSR
XCS是一种进化计算技术,可以使用位串和实值表示对数据进行分类。“实值XCS”(XCSR)通常对连续值数据集使用基于最小最大区间的表示(MMR)。文本数据集可以使用基于实值表示的词袋表示,例如词频率逆特征的文档频率。在这项工作中,我们首次使用XCSR从两个主要领域对社交媒体非正式短消息进行分类,即垃圾邮件检测和情感分析。我们对短信和电子邮件进行垃圾邮件检测,并对评论和推文进行情感分析。从短信中提取的特征向量非常稀疏,使用MMR表示的XCSR不能很好地处理稀疏数据集。我们提出了XCSR#,它使用带有显式“不在乎”间隔的MMR表示来处理稀疏的社交媒体数据集。实验结果表明,引入显式的“不关心”间隔提高了性能,并产生了统计上显著的影响,特别是在垃圾邮件检测数据集中。此外,可以观察到XCSR#比XCSR产生更准确和通用的规则。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信