Condensing class diagrams by analyzing design and network metrics using optimistic classification

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC) Pub Date : 2014-06-02 DOI:10.1145/2597008.2597157

Ferdian Thung, D. Lo, Mohd Hafeez Osman, M. Chaudron

{"title":"Condensing class diagrams by analyzing design and network metrics using optimistic classification","authors":"Ferdian Thung, D. Lo, Mohd Hafeez Osman, M. Chaudron","doi":"10.1145/2597008.2597157","DOIUrl":null,"url":null,"abstract":"A class diagram of a software system enhances our ability to understand software design. However, this diagram is often unavailable. Developers usually reconstruct the diagram by reverse engineering it from source code. Unfortunately, the resultant diagram is often very cluttered; making it difficult to learn anything valuable from it. Thus, it would be very beneficial if we are able to condense the reverse- engineered class diagram to contain only the important classes depicting the overall design of a software system. Such diagram would make program understanding much easier. A class can be important, for example, if its removal would break many connections between classes. In our work, we estimate this kind of importance by using design (e.g., number of attributes, number of dependencies, etc.) and network metrics (e.g., betweenness centrality, closeness centrality, etc.). We use these metrics as features and input their values to our optimistic classifier that will predict if a class is important or not. Different from standard classification, our newly proposed optimistic classification technique deals with data scarcity problem by optimistically assigning labels to some of the unlabeled data and use them for training a better statistical model. We have evaluated our approach to condense reverse-engineered diagrams of 9 software systems and compared our approach with the state-of-the-art work of Osman et al. Our experiments show that our approach can achieve an average Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.825, which is a 9.1% improvement compared to the state-of-the-art approach.","PeriodicalId":6853,"journal":{"name":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","volume":"19 1","pages":"110-121"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2597008.2597157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

A class diagram of a software system enhances our ability to understand software design. However, this diagram is often unavailable. Developers usually reconstruct the diagram by reverse engineering it from source code. Unfortunately, the resultant diagram is often very cluttered; making it difficult to learn anything valuable from it. Thus, it would be very beneficial if we are able to condense the reverse- engineered class diagram to contain only the important classes depicting the overall design of a software system. Such diagram would make program understanding much easier. A class can be important, for example, if its removal would break many connections between classes. In our work, we estimate this kind of importance by using design (e.g., number of attributes, number of dependencies, etc.) and network metrics (e.g., betweenness centrality, closeness centrality, etc.). We use these metrics as features and input their values to our optimistic classifier that will predict if a class is important or not. Different from standard classification, our newly proposed optimistic classification technique deals with data scarcity problem by optimistically assigning labels to some of the unlabeled data and use them for training a better statistical model. We have evaluated our approach to condense reverse-engineered diagrams of 9 software systems and compared our approach with the state-of-the-art work of Osman et al. Our experiments show that our approach can achieve an average Area Under the Receiver Operating Characteristic Curve (AUC) score of 0.825, which is a 9.1% improvement compared to the state-of-the-art approach.

查看原文本刊更多论文

采用乐观分类，通过分析设计和网络指标来压缩类图

软件系统的类图提高了我们理解软件设计的能力。然而，这个图通常是不可用的。开发人员通常通过对源代码进行逆向工程来重建图。不幸的是，生成的图表通常非常混乱;让你很难从中学到有价值的东西。因此，如果我们能够将逆向工程类图压缩为只包含描述软件系统总体设计的重要类，这将是非常有益的。这样的图表将使程序的理解更加容易。例如，如果一个类的删除会破坏类之间的许多连接，那么它可能是重要的。在我们的工作中，我们通过使用设计(例如，属性的数量，依赖关系的数量等)和网络度量(例如，中间性中心性，亲密性中心性等)来估计这种重要性。我们使用这些指标作为特征，并将其值输入到我们的乐观分类器中，该分类器将预测一个类是否重要。与标准分类不同，我们提出的乐观分类技术通过对一些未标记的数据乐观地分配标签来处理数据稀缺性问题，并利用它们来训练更好的统计模型。我们已经评估了我们的方法来压缩9个软件系统的逆向工程图，并将我们的方法与Osman等人的最先进的工作进行了比较。我们的实验表明，我们的方法可以实现接收者工作特征曲线下的平均面积(AUC)得分为0.825，与最先进的方法相比，提高了9.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC)

自引率

0.00%

发文量