{"title":"Rethinking dependence clones","authors":"Tim A. D. Henderson, Andy Podgurski","doi":"10.1109/IWSC.2017.7880512","DOIUrl":null,"url":null,"abstract":"Semantic code clones are regions of duplicated code that may appear dissimilar but compute similar functions. Since in general it is algorithmically undecidable whether two or more programs compute the same function, locating all semantic code clones is infeasible. One way to dodge the undecidability issue and find potential semantic clones, using only static information, is to search for recurring subgraphs of a program dependence graph (PDG). PDGs represent control and data dependence relationships between statements or operations in a program. PDG-based clone detection techniques, unlike syntactically-based techniques, do not distinguish between code fragments that differ only because of dependence-preserving statement re-orderings, which also preserve semantics. Consequently, they detect clones that are difficult to find by other means. Despite this very desirable property, work on PDG-based clone detection has largely stalled, apparently because of concerns about the scalability of the approach. We argue, however, that the time has come to reconsider PDG-based clone detection, as a part of a holistic strategy for clone management. We present evidence that its scalability problems are not as severe as previously thought. This suggests the possibility of developing integrated clone management systems that fuse information from multiple clone detection methods, including PDG-based ones.","PeriodicalId":222231,"journal":{"name":"2017 IEEE 11th International Workshop on Software Clones (IWSC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 11th International Workshop on Software Clones (IWSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWSC.2017.7880512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Semantic code clones are regions of duplicated code that may appear dissimilar but compute similar functions. Since in general it is algorithmically undecidable whether two or more programs compute the same function, locating all semantic code clones is infeasible. One way to dodge the undecidability issue and find potential semantic clones, using only static information, is to search for recurring subgraphs of a program dependence graph (PDG). PDGs represent control and data dependence relationships between statements or operations in a program. PDG-based clone detection techniques, unlike syntactically-based techniques, do not distinguish between code fragments that differ only because of dependence-preserving statement re-orderings, which also preserve semantics. Consequently, they detect clones that are difficult to find by other means. Despite this very desirable property, work on PDG-based clone detection has largely stalled, apparently because of concerns about the scalability of the approach. We argue, however, that the time has come to reconsider PDG-based clone detection, as a part of a holistic strategy for clone management. We present evidence that its scalability problems are not as severe as previously thought. This suggests the possibility of developing integrated clone management systems that fuse information from multiple clone detection methods, including PDG-based ones.