{"title":"Maximal frequent sequence mining for finding software clones","authors":"Yoshihisa Udagawa","doi":"10.1145/3011141.3011160","DOIUrl":null,"url":null,"abstract":"Software clones are introduced to source code by copying and slightly modifying code fragments for reuse. Thus, the problem of finding software clones is essentially the detection of strings that partially match. This paper describes a software clone detection technique using a sequential pattern-mining algorithm. After outlining a code normalization technique that extracts code-matching statements of interest from a specific programming language, viz., Java, we discuss how to extract a set of frequent sequences with gaps from a set of sequences that correspond to methods. The proposed algorithm also deals with maximal frequent sequences to find the most compact representation of sequential patterns. We define the maximal frequent sequence in the context of a partial match of sequences or gapped sequences. The novelty of our approach includes modified longest-common-subsequence (LCS) and backtrace algorithms for handling partial matches of sequences systematically. The paper also reports on the results of a case study using Apache Struts 2.5.2 Core. The results demonstrate the ability of the proposed algorithm to find clones of Types 1, 2, and 3.","PeriodicalId":247823,"journal":{"name":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","volume":"86 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3011141.3011160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Software clones are introduced to source code by copying and slightly modifying code fragments for reuse. Thus, the problem of finding software clones is essentially the detection of strings that partially match. This paper describes a software clone detection technique using a sequential pattern-mining algorithm. After outlining a code normalization technique that extracts code-matching statements of interest from a specific programming language, viz., Java, we discuss how to extract a set of frequent sequences with gaps from a set of sequences that correspond to methods. The proposed algorithm also deals with maximal frequent sequences to find the most compact representation of sequential patterns. We define the maximal frequent sequence in the context of a partial match of sequences or gapped sequences. The novelty of our approach includes modified longest-common-subsequence (LCS) and backtrace algorithms for handling partial matches of sequences systematically. The paper also reports on the results of a case study using Apache Struts 2.5.2 Core. The results demonstrate the ability of the proposed algorithm to find clones of Types 1, 2, and 3.