ASDE: Low-budget text classification via active semi-supervised learning with debiasing training mechanism

IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Yubo Chen , Tong Zhou , Daojian Zeng , Sirui Li , Kang Liu , Jun Zhao
{"title":"ASDE: Low-budget text classification via active semi-supervised learning with debiasing training mechanism","authors":"Yubo Chen ,&nbsp;Tong Zhou ,&nbsp;Daojian Zeng ,&nbsp;Sirui Li ,&nbsp;Kang Liu ,&nbsp;Jun Zhao","doi":"10.1016/j.ipm.2025.104390","DOIUrl":null,"url":null,"abstract":"<div><div>Semi-supervised learning (SSL) is widely employed in text classification to address the challenges associated with limited labeled data availability. Nevertheless, current SSL methods often exhibit two significant limitations: they typically neglect the crucial process of selecting the initial labeled data effectively, and they fail to adequately mitigate the inherent bias that accumulates due to error propagation during the semi-supervised training phase. To address these shortcomings, we introduce an <strong>A</strong>ctive <strong>S</strong>emi-supervised learning framework with a <strong>DE</strong>biasing training mechanism (<strong>ASDE</strong>). Specifically, ASDE includes a novel task-aware cold-start active data selection component designed to establish a more representative and informative initial labeled set by leveraging task-specific information. Additionally, to combat the detrimental effects of error propagation, we develop a spatial interpolation debiasing mechanism integrated into the training process. Empirical results on four widely used text classification datasets demonstrate the substantial performance gains achieved by our proposed ASDE framework, particularly under low-budget conditions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104390"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003310","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Semi-supervised learning (SSL) is widely employed in text classification to address the challenges associated with limited labeled data availability. Nevertheless, current SSL methods often exhibit two significant limitations: they typically neglect the crucial process of selecting the initial labeled data effectively, and they fail to adequately mitigate the inherent bias that accumulates due to error propagation during the semi-supervised training phase. To address these shortcomings, we introduce an Active Semi-supervised learning framework with a DEbiasing training mechanism (ASDE). Specifically, ASDE includes a novel task-aware cold-start active data selection component designed to establish a more representative and informative initial labeled set by leveraging task-specific information. Additionally, to combat the detrimental effects of error propagation, we develop a spatial interpolation debiasing mechanism integrated into the training process. Empirical results on four widely used text classification datasets demonstrate the substantial performance gains achieved by our proposed ASDE framework, particularly under low-budget conditions.
ASDE:低预算文本分类通过主动半监督学习与去偏训练机制
半监督学习(SSL)广泛应用于文本分类,以解决与有限的标记数据可用性相关的挑战。然而,当前的SSL方法经常表现出两个明显的局限性:它们通常忽略了有效地选择初始标记数据的关键过程,并且它们不能充分减轻在半监督训练阶段由于错误传播而累积的固有偏差。为了解决这些缺点,我们引入了一个带有去偏训练机制(ASDE)的主动半监督学习框架。具体来说,ASDE包括一个新颖的任务感知冷启动主动数据选择组件,旨在通过利用特定于任务的信息建立更具代表性和信息性的初始标记集。此外,为了对抗误差传播的有害影响,我们开发了一个集成到训练过程中的空间插值去偏机制。在四个广泛使用的文本分类数据集上的实证结果表明,我们提出的ASDE框架实现了实质性的性能提升,特别是在低预算条件下。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Information Processing & Management
Information Processing & Management 工程技术-计算机:信息系统
CiteScore
17.00
自引率
11.60%
发文量
276
审稿时长
39 days
期刊介绍: Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信