Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project

15th Innovations in Software Engineering Conference Pub Date : 2022-02-24 DOI:10.1145/3511430.3511441

Avijit Bhattacharjee, B. Roy, Kevin A. Schneider

{"title":"Supporting Readability by Comprehending the Hierarchical Abstraction of a Software Project","authors":"Avijit Bhattacharjee, B. Roy, Kevin A. Schneider","doi":"10.1145/3511430.3511441","DOIUrl":null,"url":null,"abstract":"Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.","PeriodicalId":138760,"journal":{"name":"15th Innovations in Software Engineering Conference","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"15th Innovations in Software Engineering Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3511430.3511441","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Exploring the source code of a software system is a prevailing task that is frequently done by contributors to a system. Practitioners often use call graphs to aid in understanding the source code of an inadequately documented software system. Call graphs, when visualized, show caller and callee relationships between functions. A static call graph provides an overall structure of a software system and dynamic call graphs generated from dynamic execution logs can be used to trace program behaviour for a particular scenario. Unfortunately a call graph of an entire system can be very complicated and hard to understand. Hierarchically abstracting a call graph can be used to summarize an entire system’s structure and more easily comprehending function calls. In this work, we mine concepts from source code entities (functions) to generate a concept cluster tree with improved naming of cluster nodes to complement existing studies and facilitate more effective program comprehension for developers. We apply three different information retrieval techniques (TFIDF, LDA, and LSI) on function names and function name variants to label the nodes of a concept cluster tree generated by clustering execution paths. From our experiment in comparing automatic labelling with manual labeling by participants for 12 use cases, we found that among the techniques on average, TFIDF performs better with 64% matching. LDA and LSI had 37% and 23% matching respectively. In addition, using the words in function name variants performed at least 5% better in participant ratings for all three techniques on average for the use cases.

查看原文本刊更多论文

通过理解软件项目的层次抽象来支持可读性

研究软件系统的源代码是系统贡献者经常完成的主要任务。从业者经常使用调用图来帮助理解文档不充分的软件系统的源代码。调用图，当可视化时，显示函数之间的调用者和被调用者关系。静态调用图提供了软件系统的总体结构，而由动态执行日志生成的动态调用图可用于跟踪特定场景的程序行为。不幸的是，整个系统的调用图可能非常复杂且难以理解。层次抽象的调用图可以用来总结整个系统的结构，更容易理解函数调用。在这项工作中，我们从源代码实体(函数)中挖掘概念，以生成具有改进的集群节点命名的概念集群树，以补充现有的研究并促进开发人员更有效的程序理解。我们对函数名和函数名变体应用了三种不同的信息检索技术(TFIDF、LDA和LSI)来标记由聚类执行路径生成的概念聚类树的节点。我们在12个用例中比较参与者的自动标记和手动标记的实验中发现，在这些技术中，TFIDF的平均匹配率更高，达到64%。LDA和LSI的匹配率分别为37%和23%。此外，在用例中，在所有三种技术的参与者评分中，在函数名变体中使用单词的平均表现至少好5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

15th Innovations in Software Engineering Conference

自引率

0.00%

发文量