{"title":"CCDLC检测框架——结合聚类和深度学习分类的语义克隆","authors":"Abdullah M. Sheneamer","doi":"10.1109/ICMLA.2018.00111","DOIUrl":null,"url":null,"abstract":"Code clones introduce difficulties in software maintenance and cause bug propagation. We propose a framework for detecting Java code obfuscation and both syntactic and semantic clones by adding cluster data which is using the sequential information bottleneck algorithm with (CNN) deep learing classification, called CCDLC. The CCDLC uses a novel Java bytecode dependency graph (BDG) along with program dependency graph (PDG) and abstract syntax tree (AST) features. We use several publicly available code clone and Java obfuscated code datasets for validating effectiveness of our framework. Our experimental results and evaluation indicate that using the combination of clustering and deep learning classification is a viable methodology, since they improve detecting clones and obfuscation code on the corpus. The key benefit of this approach is that our tool can improve detecting obfuscation accuracy about 5.44% and improve finding both Syntactic and Semantic clones accuracy about 12%","PeriodicalId":6533,"journal":{"name":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"86 1","pages":"701-706"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"CCDLC Detection Framework-Combining Clustering with Deep Learning Classification for Semantic Clones\",\"authors\":\"Abdullah M. Sheneamer\",\"doi\":\"10.1109/ICMLA.2018.00111\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Code clones introduce difficulties in software maintenance and cause bug propagation. We propose a framework for detecting Java code obfuscation and both syntactic and semantic clones by adding cluster data which is using the sequential information bottleneck algorithm with (CNN) deep learing classification, called CCDLC. The CCDLC uses a novel Java bytecode dependency graph (BDG) along with program dependency graph (PDG) and abstract syntax tree (AST) features. We use several publicly available code clone and Java obfuscated code datasets for validating effectiveness of our framework. Our experimental results and evaluation indicate that using the combination of clustering and deep learning classification is a viable methodology, since they improve detecting clones and obfuscation code on the corpus. The key benefit of this approach is that our tool can improve detecting obfuscation accuracy about 5.44% and improve finding both Syntactic and Semantic clones accuracy about 12%\",\"PeriodicalId\":6533,\"journal\":{\"name\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"86 1\",\"pages\":\"701-706\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2018.00111\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2018.00111","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
CCDLC Detection Framework-Combining Clustering with Deep Learning Classification for Semantic Clones
Code clones introduce difficulties in software maintenance and cause bug propagation. We propose a framework for detecting Java code obfuscation and both syntactic and semantic clones by adding cluster data which is using the sequential information bottleneck algorithm with (CNN) deep learing classification, called CCDLC. The CCDLC uses a novel Java bytecode dependency graph (BDG) along with program dependency graph (PDG) and abstract syntax tree (AST) features. We use several publicly available code clone and Java obfuscated code datasets for validating effectiveness of our framework. Our experimental results and evaluation indicate that using the combination of clustering and deep learning classification is a viable methodology, since they improve detecting clones and obfuscation code on the corpus. The key benefit of this approach is that our tool can improve detecting obfuscation accuracy about 5.44% and improve finding both Syntactic and Semantic clones accuracy about 12%