How Short is a Piece of String? : The Impact of Text Length and Text Augmentation on Short-text Classification

Austin Mccartney, Svetlana Hensman, L. Longo
{"title":"How Short is a Piece of String? : The Impact of Text Length and Text Augmentation on Short-text Classification","authors":"Austin Mccartney, Svetlana Hensman, L. Longo","doi":"10.21427/D7151M","DOIUrl":null,"url":null,"abstract":"Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.","PeriodicalId":286718,"journal":{"name":"Irish Conference on Artificial Intelligence and Cognitive Science","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irish Conference on Artificial Intelligence and Cognitive Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21427/D7151M","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases.
一根绳子有多短?:文本长度和文本增强对短文本分类的影响
最近短信使用和可用性的增加为通过基于机器的分类收集大量信息创造了机会。然而,传统的分类方法未能产生与较长文本的分类精度相当的准确性。以前已经采用了几种方法来扩展传统方法来克服这个问题,包括通过构建与外部数据补充来源的关联来增强原始文本。现有文献并没有精确描述文本长度对分类性能的影响。这项工作定量地检查了使用各种增强方法的一小部分分类器的准确性变化,随着文本长度逐渐减少。基于95%置信区间方差分析的结果表明,使用简单增强的分类器的性能随着文本长度的减少而下降,但使用更复杂的增强可能会导致文本的过度补充,从而导致概念漂移,分类性能随着文本长度的增加而下降。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信