Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers

Rabab Alkhalifa, E. Kochkina, A. Zubiaga
{"title":"Building for Tomorrow: Assessing the Temporal Persistence of Text Classifiers","authors":"Rabab Alkhalifa, E. Kochkina, A. Zubiaga","doi":"10.48550/arXiv.2205.05435","DOIUrl":null,"url":null,"abstract":"Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.","PeriodicalId":203304,"journal":{"name":"Inf. Process. Manag.","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Inf. Process. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.05435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Performance of text classification models tends to drop over time due to changes in data, which limits the lifetime of a pretrained model. Therefore an ability to predict a model's ability to persist over time can help design models that can be effectively used over a longer period of time. In this paper, we provide a thorough discussion into the problem, establish an evaluation setup for the task. We look at this problem from a practical perspective by assessing the ability of a wide range of language models and classification algorithms to persist over time, as well as how dataset characteristics can help predict the temporal stability of different models. We perform longitudinal classification experiments on three datasets spanning between 6 and 19 years, and involving diverse tasks and types of data. By splitting the longitudinal datasets into years, we perform a comprehensive set of experiments by training and testing across data that are different numbers of years apart from each other, both in the past and in the future. This enables a gradual investigation into the impact of the temporal gap between training and test sets on the classification performance, as well as measuring the extent of the persistence over time.
为明天而建:评估文本分类器的时间持久性
由于数据的变化,文本分类模型的性能往往会随着时间的推移而下降,这限制了预训练模型的使用寿命。因此,预测模型持续一段时间的能力可以帮助设计可以在更长的时间内有效使用的模型。在本文中,我们对这个问题进行了深入的讨论,并建立了一个任务的评估机制。我们通过评估各种语言模型和分类算法随时间持续存在的能力,以及数据集特征如何帮助预测不同模型的时间稳定性,从实际的角度来看待这个问题。我们在三个数据集上进行了纵向分类实验,这些数据集跨越6到19年,涉及不同的任务和数据类型。通过将纵向数据集划分为年份,我们通过训练和测试过去和未来不同年数的数据来执行一组全面的实验。这使得可以逐步研究训练集和测试集之间的时间差距对分类性能的影响,以及测量随时间的持续程度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信