Yubo Chen , Tong Zhou , Daojian Zeng , Sirui Li , Kang Liu , Jun Zhao
{"title":"ASDE:低预算文本分类通过主动半监督学习与去偏训练机制","authors":"Yubo Chen , Tong Zhou , Daojian Zeng , Sirui Li , Kang Liu , Jun Zhao","doi":"10.1016/j.ipm.2025.104390","DOIUrl":null,"url":null,"abstract":"<div><div>Semi-supervised learning (SSL) is widely employed in text classification to address the challenges associated with limited labeled data availability. Nevertheless, current SSL methods often exhibit two significant limitations: they typically neglect the crucial process of selecting the initial labeled data effectively, and they fail to adequately mitigate the inherent bias that accumulates due to error propagation during the semi-supervised training phase. To address these shortcomings, we introduce an <strong>A</strong>ctive <strong>S</strong>emi-supervised learning framework with a <strong>DE</strong>biasing training mechanism (<strong>ASDE</strong>). Specifically, ASDE includes a novel task-aware cold-start active data selection component designed to establish a more representative and informative initial labeled set by leveraging task-specific information. Additionally, to combat the detrimental effects of error propagation, we develop a spatial interpolation debiasing mechanism integrated into the training process. Empirical results on four widely used text classification datasets demonstrate the substantial performance gains achieved by our proposed ASDE framework, particularly under low-budget conditions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 2","pages":"Article 104390"},"PeriodicalIF":6.9000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ASDE: Low-budget text classification via active semi-supervised learning with debiasing training mechanism\",\"authors\":\"Yubo Chen , Tong Zhou , Daojian Zeng , Sirui Li , Kang Liu , Jun Zhao\",\"doi\":\"10.1016/j.ipm.2025.104390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Semi-supervised learning (SSL) is widely employed in text classification to address the challenges associated with limited labeled data availability. Nevertheless, current SSL methods often exhibit two significant limitations: they typically neglect the crucial process of selecting the initial labeled data effectively, and they fail to adequately mitigate the inherent bias that accumulates due to error propagation during the semi-supervised training phase. To address these shortcomings, we introduce an <strong>A</strong>ctive <strong>S</strong>emi-supervised learning framework with a <strong>DE</strong>biasing training mechanism (<strong>ASDE</strong>). Specifically, ASDE includes a novel task-aware cold-start active data selection component designed to establish a more representative and informative initial labeled set by leveraging task-specific information. Additionally, to combat the detrimental effects of error propagation, we develop a spatial interpolation debiasing mechanism integrated into the training process. Empirical results on four widely used text classification datasets demonstrate the substantial performance gains achieved by our proposed ASDE framework, particularly under low-budget conditions.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"63 2\",\"pages\":\"Article 104390\"},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2025-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325003310\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325003310","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
ASDE: Low-budget text classification via active semi-supervised learning with debiasing training mechanism
Semi-supervised learning (SSL) is widely employed in text classification to address the challenges associated with limited labeled data availability. Nevertheless, current SSL methods often exhibit two significant limitations: they typically neglect the crucial process of selecting the initial labeled data effectively, and they fail to adequately mitigate the inherent bias that accumulates due to error propagation during the semi-supervised training phase. To address these shortcomings, we introduce an Active Semi-supervised learning framework with a DEbiasing training mechanism (ASDE). Specifically, ASDE includes a novel task-aware cold-start active data selection component designed to establish a more representative and informative initial labeled set by leveraging task-specific information. Additionally, to combat the detrimental effects of error propagation, we develop a spatial interpolation debiasing mechanism integrated into the training process. Empirical results on four widely used text classification datasets demonstrate the substantial performance gains achieved by our proposed ASDE framework, particularly under low-budget conditions.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.