Rethinking dependence clones

2017 IEEE 11th International Workshop on Software Clones (IWSC) Pub Date : 2017-02-21 DOI:10.1109/IWSC.2017.7880512

Tim A. D. Henderson, Andy Podgurski

{"title":"Rethinking dependence clones","authors":"Tim A. D. Henderson, Andy Podgurski","doi":"10.1109/IWSC.2017.7880512","DOIUrl":null,"url":null,"abstract":"Semantic code clones are regions of duplicated code that may appear dissimilar but compute similar functions. Since in general it is algorithmically undecidable whether two or more programs compute the same function, locating all semantic code clones is infeasible. One way to dodge the undecidability issue and find potential semantic clones, using only static information, is to search for recurring subgraphs of a program dependence graph (PDG). PDGs represent control and data dependence relationships between statements or operations in a program. PDG-based clone detection techniques, unlike syntactically-based techniques, do not distinguish between code fragments that differ only because of dependence-preserving statement re-orderings, which also preserve semantics. Consequently, they detect clones that are difficult to find by other means. Despite this very desirable property, work on PDG-based clone detection has largely stalled, apparently because of concerns about the scalability of the approach. We argue, however, that the time has come to reconsider PDG-based clone detection, as a part of a holistic strategy for clone management. We present evidence that its scalability problems are not as severe as previously thought. This suggests the possibility of developing integrated clone management systems that fuse information from multiple clone detection methods, including PDG-based ones.","PeriodicalId":222231,"journal":{"name":"2017 IEEE 11th International Workshop on Software Clones (IWSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 11th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC.2017.7880512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Semantic code clones are regions of duplicated code that may appear dissimilar but compute similar functions. Since in general it is algorithmically undecidable whether two or more programs compute the same function, locating all semantic code clones is infeasible. One way to dodge the undecidability issue and find potential semantic clones, using only static information, is to search for recurring subgraphs of a program dependence graph (PDG). PDGs represent control and data dependence relationships between statements or operations in a program. PDG-based clone detection techniques, unlike syntactically-based techniques, do not distinguish between code fragments that differ only because of dependence-preserving statement re-orderings, which also preserve semantics. Consequently, they detect clones that are difficult to find by other means. Despite this very desirable property, work on PDG-based clone detection has largely stalled, apparently because of concerns about the scalability of the approach. We argue, however, that the time has come to reconsider PDG-based clone detection, as a part of a holistic strategy for clone management. We present evidence that its scalability problems are not as severe as previously thought. This suggests the possibility of developing integrated clone management systems that fuse information from multiple clone detection methods, including PDG-based ones.

查看原文本刊更多论文

重新思考依赖克隆

语义代码克隆是重复代码的区域，可能看起来不一样，但计算相似的功能。由于一般来说，两个或多个程序是否计算相同的函数在算法上是不可确定的，因此定位所有语义代码克隆是不可行的。避免不可判定问题并查找潜在语义克隆(仅使用静态信息)的一种方法是搜索程序依赖图(PDG)的重复子图。PDGs表示程序中语句或操作之间的控制和数据依赖关系。与基于语法的技术不同，基于pdg的克隆检测技术不区分仅仅因为保持依赖关系的语句重新排序而不同的代码片段，这也保留了语义。因此，他们发现了其他方法很难发现的克隆体。尽管有这种非常理想的特性，基于pdg的克隆检测工作在很大程度上停滞了，显然是因为担心该方法的可伸缩性。然而，我们认为现在是重新考虑基于pdg的克隆检测的时候了，作为克隆管理整体战略的一部分。我们提供的证据表明，它的可伸缩性问题并不像以前想象的那么严重。这表明开发综合克隆管理系统的可能性，该系统融合了来自多种克隆检测方法的信息，包括基于pdg的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 11th International Workshop on Software Clones (IWSC)

自引率

0.00%

发文量