On the effectiveness of network metrics on key class prediction: An empirical study.

IF 2.6 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

PLoS ONE Pub Date : 2025-10-10 eCollection Date: 2025-01-01 DOI:10.1371/journal.pone.0334408

Shiyuan Zhou, Wei Wu, Jiale Wang, Hongbing Liu, Chenxiang Yuan

{"title":"On the effectiveness of network metrics on key class prediction: An empirical study.","authors":"Shiyuan Zhou, Wei Wu, Jiale Wang, Hongbing Liu, Chenxiang Yuan","doi":"10.1371/journal.pone.0334408","DOIUrl":null,"url":null,"abstract":"<p><p>Key classes are the most important classes in a software system, which provide an excellent foundation for developers-especially those new to the field-to understand unfamiliar software systems. In the past decade, several key class prediction (KCP) approaches have been proposed. They used design metrics extracted from source code and unweighted network metrics computed on class coupling networks as features and built machine-learning models to predict whether a class is a key class or not. However, previous studies mainly focused on improving the performance of KCP models in the within-project (i.e., KCP in the same project) context, and the network metrics they used are unweighted and inaccurate, as they are computed on unweighted and incomplete class coupling networks. These limitations lead to a lack of thorough evaluation of the effectiveness of network metrics for KCP, especially in the cross-project (KCP across diverse projects) context, which in turn results in uncertainty about how to choose suitable metrics as features when building KCP models. To fill this gap, in this paper, we thoroughly evaluate the effectiveness of network metrics for KCP. Specifically, we build weighted and more complete class coupling networks for software, and introduce a set of weighted network metrics to characterize class complexity. Then, we build different KCP models using the Random Forest learner and the Naive Bayes model for two KCP contexts (i.e., within-project and cross-project), respectively, with design metrics, unweighted/weighted network metrics, and their combinations being features. Finally, through an empirical study on 18 open-source Java projects, we thoroughly investigate the relative effectiveness of network metrics over design metrics across the two KCP contexts. Our results suggest that when building KCP models, to achieve better performance, researchers and practitioners should consider using unweighted (or weighted) network metrics alone or along with design metrics in the within-project KCP context, using design metrics alone or along with unweighted (or weighted) network metrics in the cross-project KCP context, and using unweighted (or weighted) network metrics along with design metrics across the two KCP contexts.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":"20 10","pages":"e0334408"},"PeriodicalIF":2.6000,"publicationDate":"2025-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12513632/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0334408","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Key classes are the most important classes in a software system, which provide an excellent foundation for developers-especially those new to the field-to understand unfamiliar software systems. In the past decade, several key class prediction (KCP) approaches have been proposed. They used design metrics extracted from source code and unweighted network metrics computed on class coupling networks as features and built machine-learning models to predict whether a class is a key class or not. However, previous studies mainly focused on improving the performance of KCP models in the within-project (i.e., KCP in the same project) context, and the network metrics they used are unweighted and inaccurate, as they are computed on unweighted and incomplete class coupling networks. These limitations lead to a lack of thorough evaluation of the effectiveness of network metrics for KCP, especially in the cross-project (KCP across diverse projects) context, which in turn results in uncertainty about how to choose suitable metrics as features when building KCP models. To fill this gap, in this paper, we thoroughly evaluate the effectiveness of network metrics for KCP. Specifically, we build weighted and more complete class coupling networks for software, and introduce a set of weighted network metrics to characterize class complexity. Then, we build different KCP models using the Random Forest learner and the Naive Bayes model for two KCP contexts (i.e., within-project and cross-project), respectively, with design metrics, unweighted/weighted network metrics, and their combinations being features. Finally, through an empirical study on 18 open-source Java projects, we thoroughly investigate the relative effectiveness of network metrics over design metrics across the two KCP contexts. Our results suggest that when building KCP models, to achieve better performance, researchers and practitioners should consider using unweighted (or weighted) network metrics alone or along with design metrics in the within-project KCP context, using design metrics alone or along with unweighted (or weighted) network metrics in the cross-project KCP context, and using unweighted (or weighted) network metrics along with design metrics across the two KCP contexts.

查看原文本刊更多论文

网络指标对关键类别预测有效性的实证研究。

关键类是软件系统中最重要的类，它们为开发人员（尤其是该领域的新手）理解不熟悉的软件系统提供了良好的基础。在过去的十年中，已经提出了几种关键类预测（KCP）方法。他们使用从源代码中提取的设计指标和在类耦合网络上计算的未加权网络指标作为特征，并建立机器学习模型来预测一个类是否为关键类。然而，以往的研究主要集中在提高项目内（即同一项目中的KCP）环境下KCP模型的性能，并且它们使用的网络度量是未加权的和不准确的，因为它们是在未加权的和不完整的类耦合网络上计算的。这些限制导致缺乏对KCP网络度量的有效性的全面评估，特别是在跨项目（跨不同项目的KCP）环境中，这反过来导致在构建KCP模型时如何选择合适的度量作为特征的不确定性。为了填补这一空白，在本文中，我们全面评估了网络指标对KCP的有效性。具体来说，我们为软件构建了加权的更完整的类耦合网络，并引入了一组加权的网络度量来表征类的复杂性。然后，我们使用随机森林学习器和朴素贝叶斯模型分别为两种KCP上下文（即项目内和跨项目）构建不同的KCP模型，以设计度量、未加权/加权网络度量及其组合为特征。最后，通过对18个开源Java项目的实证研究，我们深入研究了两种KCP背景下网络度量相对于设计度量的相对有效性。我们的研究结果表明，在构建KCP模型时，为了获得更好的性能，研究人员和从业者应该考虑在项目内的KCP环境中单独使用未加权（或加权）的网络指标或与设计指标一起使用，在跨项目的KCP环境中单独使用设计指标或与未加权（或加权）的网络指标一起使用，并在两个KCP环境中使用未加权（或加权）的网络指标和设计指标。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PLoS ONE 生物-生物学

CiteScore

6.20

自引率

5.40%

发文量

14242

审稿时长

3.7 months

期刊介绍： PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage