{"title":"Conformal prediction for labelling and updating online models in the presence of concept drift in cybersecurity","authors":"David Escudero García , Noemí DeCastro-García","doi":"10.1016/j.jisa.2025.104120","DOIUrl":null,"url":null,"abstract":"<div><div>Machine learning is used for detecting malicious activity in cybersecurity contexts since it provides more adaptable models than signature-based solutions. One of the main challenges in applying machine learning to detect malicious activity is the presence of concept drift, which is a change in data distribution over time. Online models that are updated dynamically are usually applied to handle drift. However, these models require new labelled instances to be updated. Reliable labels are typically scarce, expensive to obtain, and not immediately available, which makes building an effective model difficult. In this work, we propose applying online models with conformal prediction, which provides statistical guarantees, to obtain reliable pseudo-labels to update the model and mitigate the absence of ground truth in new data. Although the use of conformal pseudo-labels produces significant improvements in some cases, these are inconsistent across datasets and models, which limits the applicability of the approach.</div></div>","PeriodicalId":48638,"journal":{"name":"Journal of Information Security and Applications","volume":"93 ","pages":"Article 104120"},"PeriodicalIF":3.8000,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Security and Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214212625001577","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning is used for detecting malicious activity in cybersecurity contexts since it provides more adaptable models than signature-based solutions. One of the main challenges in applying machine learning to detect malicious activity is the presence of concept drift, which is a change in data distribution over time. Online models that are updated dynamically are usually applied to handle drift. However, these models require new labelled instances to be updated. Reliable labels are typically scarce, expensive to obtain, and not immediately available, which makes building an effective model difficult. In this work, we propose applying online models with conformal prediction, which provides statistical guarantees, to obtain reliable pseudo-labels to update the model and mitigate the absence of ground truth in new data. Although the use of conformal pseudo-labels produces significant improvements in some cases, these are inconsistent across datasets and models, which limits the applicability of the approach.
期刊介绍:
Journal of Information Security and Applications (JISA) focuses on the original research and practice-driven applications with relevance to information security and applications. JISA provides a common linkage between a vibrant scientific and research community and industry professionals by offering a clear view on modern problems and challenges in information security, as well as identifying promising scientific and "best-practice" solutions. JISA issues offer a balance between original research work and innovative industrial approaches by internationally renowned information security experts and researchers.