代码库中接口冗余的探索性研究

A. C. D. Paula, E. Guerra, C. Lopes, Hitesh Sajnani, Otávio Augusto Lazzarini Lemos
{"title":"代码库中接口冗余的探索性研究","authors":"A. C. D. Paula, E. Guerra, C. Lopes, Hitesh Sajnani, Otávio Augusto Lazzarini Lemos","doi":"10.1109/SCAM.2016.31","DOIUrl":null,"url":null,"abstract":"An important property of software repositories is their level of cross-project redundancy. For instance, much has been done to assess how much code cloning happens across software corpora. In this paper we study a much less targeted type of replication: Interface Redundancy (IR). IR refers to the level of repetition of whole method interfaces - return type, method name, and parameters types - across a code corpus. Such type of redundancy is important because if two non-trivial methods ever share the same interface it is very likely that they implement analogous functions, even though their code, structure, or vocabulary might be diverse. A certain level of IR is a requirement for approaches that rely on the recurrence of interfaces to fulfill a given task (e.g., interface-driven code search - IDCS). In this paper we report on an experiment to measure IR in a large-scale Java repository. Our target corpus contains more than 380,000 methods from 99 Java projects extracted randomly from an open source repository. Results are promising as they show that the chances of an interface from a non-trivial method to repeat itself across a large repository is around 25% (i.e., approximately 1/4 of such interfaces are redundant). Also, more than 80% of the target projects contained IR (with the average percentage of redundant interfaces for these projects being above 30%). As additional analyses we investigated the distribution of the different types of redundant interfaces (e.g., intra-vs inter-project), characterized the redundant interfaces and show that such a knowledge can help improve IDCS, and provided evidence that only a very small part of IR refers to method cloning (around 0.002%).","PeriodicalId":407579,"journal":{"name":"2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"An Exploratory Study of Interface Redundancy in Code Repositories\",\"authors\":\"A. C. D. Paula, E. Guerra, C. Lopes, Hitesh Sajnani, Otávio Augusto Lazzarini Lemos\",\"doi\":\"10.1109/SCAM.2016.31\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An important property of software repositories is their level of cross-project redundancy. For instance, much has been done to assess how much code cloning happens across software corpora. In this paper we study a much less targeted type of replication: Interface Redundancy (IR). IR refers to the level of repetition of whole method interfaces - return type, method name, and parameters types - across a code corpus. Such type of redundancy is important because if two non-trivial methods ever share the same interface it is very likely that they implement analogous functions, even though their code, structure, or vocabulary might be diverse. A certain level of IR is a requirement for approaches that rely on the recurrence of interfaces to fulfill a given task (e.g., interface-driven code search - IDCS). In this paper we report on an experiment to measure IR in a large-scale Java repository. Our target corpus contains more than 380,000 methods from 99 Java projects extracted randomly from an open source repository. Results are promising as they show that the chances of an interface from a non-trivial method to repeat itself across a large repository is around 25% (i.e., approximately 1/4 of such interfaces are redundant). Also, more than 80% of the target projects contained IR (with the average percentage of redundant interfaces for these projects being above 30%). As additional analyses we investigated the distribution of the different types of redundant interfaces (e.g., intra-vs inter-project), characterized the redundant interfaces and show that such a knowledge can help improve IDCS, and provided evidence that only a very small part of IR refers to method cloning (around 0.002%).\",\"PeriodicalId\":407579,\"journal\":{\"name\":\"2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCAM.2016.31\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCAM.2016.31","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

软件存储库的一个重要属性是它们的跨项目冗余级别。例如,已经做了很多工作来评估跨软件语料库发生了多少代码克隆。在本文中,我们研究了一种不太有针对性的复制类型:接口冗余(IR)。IR指的是整个方法接口(返回类型、方法名称和参数类型)在代码语料库中的重复程度。这种类型的冗余很重要,因为如果两个重要的方法共享相同的接口,那么它们很可能实现类似的功能,即使它们的代码、结构或词汇表可能不同。一定程度的IR是对依赖于接口重复来完成给定任务的方法的要求(例如,接口驱动的代码搜索- IDCS)。在本文中,我们报告了一个在大规模Java存储库中测量IR的实验。我们的目标语料库包含从开放源代码存储库中随机提取的99个Java项目中的380,000多个方法。结果是有希望的,因为它们表明,来自非平凡方法的接口在大型存储库中重复自身的可能性约为25%(即,大约1/4这样的接口是冗余的)。此外,超过80%的目标项目包含IR(这些项目中冗余接口的平均百分比超过30%)。作为附加分析,我们调查了不同类型的冗余接口(例如,项目内部和项目间)的分布,表征了冗余接口,并表明这种知识有助于改善IDCS,并提供证据表明只有很小一部分IR涉及方法克隆(约0.002%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Exploratory Study of Interface Redundancy in Code Repositories
An important property of software repositories is their level of cross-project redundancy. For instance, much has been done to assess how much code cloning happens across software corpora. In this paper we study a much less targeted type of replication: Interface Redundancy (IR). IR refers to the level of repetition of whole method interfaces - return type, method name, and parameters types - across a code corpus. Such type of redundancy is important because if two non-trivial methods ever share the same interface it is very likely that they implement analogous functions, even though their code, structure, or vocabulary might be diverse. A certain level of IR is a requirement for approaches that rely on the recurrence of interfaces to fulfill a given task (e.g., interface-driven code search - IDCS). In this paper we report on an experiment to measure IR in a large-scale Java repository. Our target corpus contains more than 380,000 methods from 99 Java projects extracted randomly from an open source repository. Results are promising as they show that the chances of an interface from a non-trivial method to repeat itself across a large repository is around 25% (i.e., approximately 1/4 of such interfaces are redundant). Also, more than 80% of the target projects contained IR (with the average percentage of redundant interfaces for these projects being above 30%). As additional analyses we investigated the distribution of the different types of redundant interfaces (e.g., intra-vs inter-project), characterized the redundant interfaces and show that such a knowledge can help improve IDCS, and provided evidence that only a very small part of IR refers to method cloning (around 0.002%).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信