Han Zhang , Bingzhi Xu , Shijie Xiao , Chengfang Zhang , Lixia Ji
{"title":"Zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning","authors":"Han Zhang , Bingzhi Xu , Shijie Xiao , Chengfang Zhang , Lixia Ji","doi":"10.1016/j.ipm.2025.104344","DOIUrl":null,"url":null,"abstract":"<div><div>Traditional cybersecurity event detection has primarily focused on English corpora. However, Chinese corpora pose challenges due to linguistic complexity and the lack of annotated datasets, particularly in recognizing nested compound trigger words and handling zero- and few-shot scenarios. To address these issues, we propose a method, named zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning (CCED). Firstly, we introduce a dynamic dimension transformation mechanism to embed geometric information into span representations for nested compound trigger words extraction in a Chinese corpus. Secondly, we propose meta-distillation learning, which integrates meta-learning with contrastive knowledge distillation to improve model performance. This method boosts accuracy in zero- and few-shot scenarios by facilitating knowledge transfer across tasks. Moreover, to fill the gap in datasets for Chinese cybersecurity event detection, we develop CSED, to the best of our knowledge, the first publicly available annotated dataset in this domain. It includes a large collection of news articles from sources like CNCERT and Twitter, with 17,542 event instances, categorized into 2 event types and 9 sub-types. CCED achieves state-of-the-art F1 scores on CSED, with 57.61%, 76.83%, and 79.14% in zero-shot and few-shot settings, respectively. The dataset and code can be accessed on GitHub: <span><span>https://github.com/vegetable-edu/CCED</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104344"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002857","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional cybersecurity event detection has primarily focused on English corpora. However, Chinese corpora pose challenges due to linguistic complexity and the lack of annotated datasets, particularly in recognizing nested compound trigger words and handling zero- and few-shot scenarios. To address these issues, we propose a method, named zero- and few-shot Chinese cybersecurity event detection via meta-distillation learning (CCED). Firstly, we introduce a dynamic dimension transformation mechanism to embed geometric information into span representations for nested compound trigger words extraction in a Chinese corpus. Secondly, we propose meta-distillation learning, which integrates meta-learning with contrastive knowledge distillation to improve model performance. This method boosts accuracy in zero- and few-shot scenarios by facilitating knowledge transfer across tasks. Moreover, to fill the gap in datasets for Chinese cybersecurity event detection, we develop CSED, to the best of our knowledge, the first publicly available annotated dataset in this domain. It includes a large collection of news articles from sources like CNCERT and Twitter, with 17,542 event instances, categorized into 2 event types and 9 sub-types. CCED achieves state-of-the-art F1 scores on CSED, with 57.61%, 76.83%, and 79.14% in zero-shot and few-shot settings, respectively. The dataset and code can be accessed on GitHub: https://github.com/vegetable-edu/CCED.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.