On the effectiveness of network metrics on key class prediction: An empirical study.

IF 2.6 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
PLoS ONE Pub Date : 2025-10-10 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0334408
Shiyuan Zhou, Wei Wu, Jiale Wang, Hongbing Liu, Chenxiang Yuan
{"title":"On the effectiveness of network metrics on key class prediction: An empirical study.","authors":"Shiyuan Zhou, Wei Wu, Jiale Wang, Hongbing Liu, Chenxiang Yuan","doi":"10.1371/journal.pone.0334408","DOIUrl":null,"url":null,"abstract":"<p><p>Key classes are the most important classes in a software system, which provide an excellent foundation for developers-especially those new to the field-to understand unfamiliar software systems. In the past decade, several key class prediction (KCP) approaches have been proposed. They used design metrics extracted from source code and unweighted network metrics computed on class coupling networks as features and built machine-learning models to predict whether a class is a key class or not. However, previous studies mainly focused on improving the performance of KCP models in the within-project (i.e., KCP in the same project) context, and the network metrics they used are unweighted and inaccurate, as they are computed on unweighted and incomplete class coupling networks. These limitations lead to a lack of thorough evaluation of the effectiveness of network metrics for KCP, especially in the cross-project (KCP across diverse projects) context, which in turn results in uncertainty about how to choose suitable metrics as features when building KCP models. To fill this gap, in this paper, we thoroughly evaluate the effectiveness of network metrics for KCP. Specifically, we build weighted and more complete class coupling networks for software, and introduce a set of weighted network metrics to characterize class complexity. Then, we build different KCP models using the Random Forest learner and the Naive Bayes model for two KCP contexts (i.e., within-project and cross-project), respectively, with design metrics, unweighted/weighted network metrics, and their combinations being features. Finally, through an empirical study on 18 open-source Java projects, we thoroughly investigate the relative effectiveness of network metrics over design metrics across the two KCP contexts. Our results suggest that when building KCP models, to achieve better performance, researchers and practitioners should consider using unweighted (or weighted) network metrics alone or along with design metrics in the within-project KCP context, using design metrics alone or along with unweighted (or weighted) network metrics in the cross-project KCP context, and using unweighted (or weighted) network metrics along with design metrics across the two KCP contexts.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 10","pages":"e0334408"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12513632/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0334408","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Key classes are the most important classes in a software system, which provide an excellent foundation for developers-especially those new to the field-to understand unfamiliar software systems. In the past decade, several key class prediction (KCP) approaches have been proposed. They used design metrics extracted from source code and unweighted network metrics computed on class coupling networks as features and built machine-learning models to predict whether a class is a key class or not. However, previous studies mainly focused on improving the performance of KCP models in the within-project (i.e., KCP in the same project) context, and the network metrics they used are unweighted and inaccurate, as they are computed on unweighted and incomplete class coupling networks. These limitations lead to a lack of thorough evaluation of the effectiveness of network metrics for KCP, especially in the cross-project (KCP across diverse projects) context, which in turn results in uncertainty about how to choose suitable metrics as features when building KCP models. To fill this gap, in this paper, we thoroughly evaluate the effectiveness of network metrics for KCP. Specifically, we build weighted and more complete class coupling networks for software, and introduce a set of weighted network metrics to characterize class complexity. Then, we build different KCP models using the Random Forest learner and the Naive Bayes model for two KCP contexts (i.e., within-project and cross-project), respectively, with design metrics, unweighted/weighted network metrics, and their combinations being features. Finally, through an empirical study on 18 open-source Java projects, we thoroughly investigate the relative effectiveness of network metrics over design metrics across the two KCP contexts. Our results suggest that when building KCP models, to achieve better performance, researchers and practitioners should consider using unweighted (or weighted) network metrics alone or along with design metrics in the within-project KCP context, using design metrics alone or along with unweighted (or weighted) network metrics in the cross-project KCP context, and using unweighted (or weighted) network metrics along with design metrics across the two KCP contexts.

网络指标对关键类别预测有效性的实证研究。
关键类是软件系统中最重要的类,它们为开发人员(尤其是该领域的新手)理解不熟悉的软件系统提供了良好的基础。在过去的十年中,已经提出了几种关键类预测(KCP)方法。他们使用从源代码中提取的设计指标和在类耦合网络上计算的未加权网络指标作为特征,并建立机器学习模型来预测一个类是否为关键类。然而,以往的研究主要集中在提高项目内(即同一项目中的KCP)环境下KCP模型的性能,并且它们使用的网络度量是未加权的和不准确的,因为它们是在未加权的和不完整的类耦合网络上计算的。这些限制导致缺乏对KCP网络度量的有效性的全面评估,特别是在跨项目(跨不同项目的KCP)环境中,这反过来导致在构建KCP模型时如何选择合适的度量作为特征的不确定性。为了填补这一空白,在本文中,我们全面评估了网络指标对KCP的有效性。具体来说,我们为软件构建了加权的更完整的类耦合网络,并引入了一组加权的网络度量来表征类的复杂性。然后,我们使用随机森林学习器和朴素贝叶斯模型分别为两种KCP上下文(即项目内和跨项目)构建不同的KCP模型,以设计度量、未加权/加权网络度量及其组合为特征。最后,通过对18个开源Java项目的实证研究,我们深入研究了两种KCP背景下网络度量相对于设计度量的相对有效性。我们的研究结果表明,在构建KCP模型时,为了获得更好的性能,研究人员和从业者应该考虑在项目内的KCP环境中单独使用未加权(或加权)的网络指标或与设计指标一起使用,在跨项目的KCP环境中单独使用设计指标或与未加权(或加权)的网络指标一起使用,并在两个KCP环境中使用未加权(或加权)的网络指标和设计指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信