Bag-of-Word approach is not dead: A performance analysis on a myriad of text classification challenges

Mario Graff , Daniela Moctezuma , Eric S. Téllez
{"title":"Bag-of-Word approach is not dead: A performance analysis on a myriad of text classification challenges","authors":"Mario Graff ,&nbsp;Daniela Moctezuma ,&nbsp;Eric S. Téllez","doi":"10.1016/j.nlp.2025.100154","DOIUrl":null,"url":null,"abstract":"<div><div>The Bag-of-Words (BoW) representation, enhanced with a classifier, was a pioneering approach to solving text classification problems. However, with the advent of transformers and, in general, deep learning architectures, the field has dynamically shifted its focus towards customizing these architectures for various natural language processing tasks, including text classification problems. For a newcomer, it might be impossible to realize that for some text classification problems, the traditional approach is still competitive. This research analyzes the competitiveness of BoW-based representations in different text-classification competitions run in English, Spanish, and Italian. To analyze the performance of these BoW-based representations, we participated in 12 text classification international competitions, summing up 24 tasks comprising five English tasks, seven in Italian, and twelve in Spanish. The results show that the proposed BoW representations have a difference of just 10% w.r.t. the competition winner and less than 2% in three tasks corresponding to author profiling. BoW outperforms BERT solutions and dominates in author profiling tasks.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100154"},"PeriodicalIF":0.0000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000305","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Bag-of-Words (BoW) representation, enhanced with a classifier, was a pioneering approach to solving text classification problems. However, with the advent of transformers and, in general, deep learning architectures, the field has dynamically shifted its focus towards customizing these architectures for various natural language processing tasks, including text classification problems. For a newcomer, it might be impossible to realize that for some text classification problems, the traditional approach is still competitive. This research analyzes the competitiveness of BoW-based representations in different text-classification competitions run in English, Spanish, and Italian. To analyze the performance of these BoW-based representations, we participated in 12 text classification international competitions, summing up 24 tasks comprising five English tasks, seven in Italian, and twelve in Spanish. The results show that the proposed BoW representations have a difference of just 10% w.r.t. the competition winner and less than 2% in three tasks corresponding to author profiling. BoW outperforms BERT solutions and dominates in author profiling tasks.
Bag-of-Word方法并没有消亡:对无数文本分类挑战的性能分析
用分类器增强的词袋(BoW)表示是解决文本分类问题的一种开创性方法。然而,随着变形器和深度学习体系结构的出现,该领域已经动态地将重点转向为各种自然语言处理任务定制这些体系结构,包括文本分类问题。对于新手来说,可能无法意识到对于某些文本分类问题,传统方法仍然具有竞争力。本研究分析了基于bow的表示在英语、西班牙语和意大利语的不同文本分类竞赛中的竞争力。为了分析这些基于bow的表示的性能,我们参加了12个文本分类国际比赛,总结了24个任务,其中包括5个英语任务,7个意大利语任务和12个西班牙语任务。结果表明,所提出的BoW表示与竞赛获胜者的w.r.t.差异仅为10%,与作者分析对应的三个任务的w.r.t.差异小于2%。BoW优于BERT解决方案,在作者分析任务中占主导地位。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信