Linyi Han, Hang Li, Xiaowang Zhang, Youmeng Li, Zhiyong Feng
{"title":"UCLP: Unsupervised Classification of Key Aspects in Vulnerability Descriptions Through Label Profile","authors":"Linyi Han, Hang Li, Xiaowang Zhang, Youmeng Li, Zhiyong Feng","doi":"10.1002/smr.70052","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Textual vulnerability descriptions (TVDs) in repositories like NVD and IBM X-Force Exchange are essential for security engineers managing vulnerabilities. Engineers typically search for key aspects in TVDs using specific phrases, but with multiple expressions for each aspect, retrieving all relevant records is challenging. We propose a label-based retrieval framework that classifies key aspects and retrieves TVDs by their broader categories. Given the large data volume, manual labeling is infeasible, making unsupervised classification critical. However, short labels and repeated words diminish semantic clarity, affecting classification accuracy. We introduce Unsupervised Classification through Label Profile (UCLP), which expands label semantics through label profiles inspired by recommendation systems. We construct profiles using neural network weights and apply TF-IDF to calculate similarities, smoothing distributions with an arctangent function. Results show that UCLP significantly outperforms four benchmarks, raising accuracy from 68.3% to 78.9% and improving three real-world applications.</p>\n </div>","PeriodicalId":48898,"journal":{"name":"Journal of Software-Evolution and Process","volume":"37 9","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Software-Evolution and Process","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/smr.70052","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Textual vulnerability descriptions (TVDs) in repositories like NVD and IBM X-Force Exchange are essential for security engineers managing vulnerabilities. Engineers typically search for key aspects in TVDs using specific phrases, but with multiple expressions for each aspect, retrieving all relevant records is challenging. We propose a label-based retrieval framework that classifies key aspects and retrieves TVDs by their broader categories. Given the large data volume, manual labeling is infeasible, making unsupervised classification critical. However, short labels and repeated words diminish semantic clarity, affecting classification accuracy. We introduce Unsupervised Classification through Label Profile (UCLP), which expands label semantics through label profiles inspired by recommendation systems. We construct profiles using neural network weights and apply TF-IDF to calculate similarities, smoothing distributions with an arctangent function. Results show that UCLP significantly outperforms four benchmarks, raising accuracy from 68.3% to 78.9% and improving three real-world applications.