A search-based approach to multi-view clustering of software systems

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) Pub Date : 2015-03-02 DOI:10.1109/SANER.2015.7081853

Amir Saeidi, Jurriaan Hage, R. Khadka, S. Jansen

{"title":"A search-based approach to multi-view clustering of software systems","authors":"Amir Saeidi, Jurriaan Hage, R. Khadka, S. Jansen","doi":"10.1109/SANER.2015.7081853","DOIUrl":null,"url":null,"abstract":"Unsupervised software clustering is the problem of automatically decomposing the software system into meaningful units. Some approaches solely rely on the structure of the system, such as the module dependency graph, to decompose the software systems into cohesive groups of modules. Other techniques focus on the informal knowledge hidden within the source code itself to retrieve the modular architecture of the system. However both techniques in the case of large systems fail to produce decompositions that correspond to the actual architecture of the system. To overcome this problem, we propose a novel approach to clustering software systems by incorporating knowledge from different viewpoints of the system, such as the knowledge embedded within the source code as well as the structural dependencies within the system, to produce a clustering. In this setting, we adopt a search-based approach to the encoding of multi-view clustering and investigate two approaches to tackle this problem, one based on a linear combination of objectives into a single objective, the other a multi-objective approach to clustering. We evaluate our approach against a set of substantial software systems. The two approaches are evaluated on a dataset comprising of 10 Java open source projects. Finally, we propose two techniques based on interpolation and hierarchical clustering to combine different results obtained to yield a single result for single-objective and multi-objective encodings, respectively.","PeriodicalId":355949,"journal":{"name":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SANER.2015.7081853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Unsupervised software clustering is the problem of automatically decomposing the software system into meaningful units. Some approaches solely rely on the structure of the system, such as the module dependency graph, to decompose the software systems into cohesive groups of modules. Other techniques focus on the informal knowledge hidden within the source code itself to retrieve the modular architecture of the system. However both techniques in the case of large systems fail to produce decompositions that correspond to the actual architecture of the system. To overcome this problem, we propose a novel approach to clustering software systems by incorporating knowledge from different viewpoints of the system, such as the knowledge embedded within the source code as well as the structural dependencies within the system, to produce a clustering. In this setting, we adopt a search-based approach to the encoding of multi-view clustering and investigate two approaches to tackle this problem, one based on a linear combination of objectives into a single objective, the other a multi-objective approach to clustering. We evaluate our approach against a set of substantial software systems. The two approaches are evaluated on a dataset comprising of 10 Java open source projects. Finally, we propose two techniques based on interpolation and hierarchical clustering to combine different results obtained to yield a single result for single-objective and multi-objective encodings, respectively.

查看原文本刊更多论文

基于搜索的软件系统多视图聚类方法

无监督软件聚类是将软件系统自动分解为有意义的单元的问题。一些方法仅仅依赖于系统的结构，例如模块依赖关系图，将软件系统分解为内聚的模块组。其他技术侧重于隐藏在源代码本身中的非正式知识，以检索系统的模块化体系结构。然而，在大型系统的情况下，这两种技术都不能产生与系统的实际体系结构相对应的分解。为了克服这个问题，我们提出了一种新的方法来聚类软件系统，通过结合来自系统不同观点的知识，例如嵌入在源代码中的知识以及系统内的结构依赖关系，来产生聚类。在此背景下，我们采用了一种基于搜索的方法来编码多视图聚类，并研究了两种方法来解决这一问题，一种是基于目标的线性组合成单个目标，另一种是基于多目标的聚类方法。我们根据一组实际的软件系统来评估我们的方法。在包含10个Java开源项目的数据集上对这两种方法进行了评估。最后，我们提出了两种基于插值和分层聚类的技术，将不同的结果组合在一起，分别对单目标和多目标编码产生单一的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER)

自引率

0.00%

发文量