{"title":"NegCPARBP: Enhancing Privacy Protection for Cross-Project Aging-Related Bug Prediction Based on Negative Database","authors":"Dongdong Zhao;Zhihui Liu;Fengji Zhang;Lei Liu;Jacky Wai Keung;Xiao Yu","doi":"10.1109/TETC.2025.3546549","DOIUrl":null,"url":null,"abstract":"The emergence of <underline>A</u>ging-<underline>R</u>elated <underline>B</u>ug<underline>s</u> (ARBs) poses a significant challenge to software systems, resulting in performance degradation and increased error rates in resource-intensive systems. Consequently, numerous ARB prediction methods have been developed to mitigate these issues. However, in scenarios where training data is limited, the effectiveness of ARB prediction is often suboptimal. To address this problem, <underline>C</u>ross-<underline>P</u>roject <underline>A</u>ging-<underline>R</u>elated <underline>B</u>ug <underline>P</u>rediction (CPARBP) is proposed, which utilizes data from other projects (i.e., source projects) to train a model aimed at predicting potential ARBs in a target project. However, the use of source-project data raises privacy concerns and discourages companies from sharing their data. Therefore, we propose a method called <underline>C</u>ross-<underline>P</u>roject <underline>A</u>ging-<underline>R</u>elated <underline>B</u>ug <underline>P</u>rediction based on <underline>Neg</u>ative Database (NegCPARBP) for privacy protection. NegCPARBP first converts the feature vector of a software file into a binary string. Second, the corresponding <underline>N</u>egative <underline>D</u>ata<underline>B</u>ase (<italic>NDB</i>) is generated based on this binary string, containing data that is significantly more expressive from the original feature vector. Furthermore, to ensure more accurate prediction of ARB-prone and ARB-free files based on privacy-protected data (i.e., maintain the data utility), we propose a novel negative database generation algorithm that captures more information about important features, using information gain as a measure. Finally, NegCPARBP extracts a new feature vector from the <italic>NDB</i> to represent the original feature vector, facilitating data sharing and ARB prediction objectives. Experimental results on Linux, MySQL, and NetBSD datasets demonstrate that NegCPARBP achieves a high defense against attacks (privacy protection performance reaching 0.97) and better data utility compared to existing privacy protection methods.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 2","pages":"283-298"},"PeriodicalIF":5.4000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10914513/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The emergence of Aging-Related Bugs (ARBs) poses a significant challenge to software systems, resulting in performance degradation and increased error rates in resource-intensive systems. Consequently, numerous ARB prediction methods have been developed to mitigate these issues. However, in scenarios where training data is limited, the effectiveness of ARB prediction is often suboptimal. To address this problem, Cross-Project Aging-Related Bug Prediction (CPARBP) is proposed, which utilizes data from other projects (i.e., source projects) to train a model aimed at predicting potential ARBs in a target project. However, the use of source-project data raises privacy concerns and discourages companies from sharing their data. Therefore, we propose a method called Cross-Project Aging-Related Bug Prediction based on Negative Database (NegCPARBP) for privacy protection. NegCPARBP first converts the feature vector of a software file into a binary string. Second, the corresponding Negative DataBase (NDB) is generated based on this binary string, containing data that is significantly more expressive from the original feature vector. Furthermore, to ensure more accurate prediction of ARB-prone and ARB-free files based on privacy-protected data (i.e., maintain the data utility), we propose a novel negative database generation algorithm that captures more information about important features, using information gain as a measure. Finally, NegCPARBP extracts a new feature vector from the NDB to represent the original feature vector, facilitating data sharing and ARB prediction objectives. Experimental results on Linux, MySQL, and NetBSD datasets demonstrate that NegCPARBP achieves a high defense against attacks (privacy protection performance reaching 0.97) and better data utility compared to existing privacy protection methods.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.