苹果、橘子和水果——通过不同工件的镜头理解软件存储库的相似性

A. Rao, S. Chimalakonda
{"title":"苹果、橘子和水果——通过不同工件的镜头理解软件存储库的相似性","authors":"A. Rao, S. Chimalakonda","doi":"10.1109/ICSME55016.2022.00044","DOIUrl":null,"url":null,"abstract":"Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine \"whether dissimilar artifacts can be used as one of the criteria to find similar repositories?\" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Apples, Oranges & Fruits – Understanding Similarity of Software Repositories Through The Lens of Dissimilar Artifacts\",\"authors\":\"A. Rao, S. Chimalakonda\",\"doi\":\"10.1109/ICSME55016.2022.00044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine \\\"whether dissimilar artifacts can be used as one of the criteria to find similar repositories?\\\" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.\",\"PeriodicalId\":300084,\"journal\":{\"name\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"volume\":\"356 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSME55016.2022.00044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

开源存储库促进了开发人员重用现有的软件构件来开发和维护新的或类似类型的软件。然而,查找相似的存储库是一项具有挑战性的任务,因为相似性的概念取决于多个上下文,并且大多数现有方法倾向于通过比较相似的软件工件来查找相似的存储库。本文旨在确定“是否可以将不同的工件用作查找相似存储库的标准之一?”尽管两个相似的工件之间可能存在不同之处,但两个不同的工件之间也可能存在相似之处。我们通过定义两类相似的存储库来定义相似性的概念。为实验选择了四个基于文本的工件,即拉取请求、问题、提交和自述文件。计算不同工件之间的文本相似度。结果表明,不同工件之间存在相似性。我们观察到10-20%的不同工件对可以用于搜索相似的存储库。初步的结果显示了有希望的方向,在搜索类似的存储库时也可以考虑不同的工件,从而激发进一步研究的需要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Apples, Oranges & Fruits – Understanding Similarity of Software Repositories Through The Lens of Dissimilar Artifacts
Open-source repositories have facilitated developers to reuse existing software artifacts to develop and maintain new or similar kinds of software. However, finding similar repositories is a challenging task as the notion of similarity varies depending on multiple contexts, and most of the existing approaches tend to find similar repositories by comparing similar software artifacts. This paper aims to determine "whether dissimilar artifacts can be used as one of the criteria to find similar repositories?" Even though, there could be dissimilarity between two similar artifacts, there could also be similarities between two dissimilar artifacts. We define the notion of similarity by defining two categories of similar repositories. Four text-based artifacts are selected for the experiment, i.e., pull-requests, issues, commits, and readme files. The textual similarity is computed between different artifacts. The results show that similarity does exist in dissimilar artifacts. We observed that 10-20% of dissimilar artifact pairs could be used in searching similar repositories. The preliminary results show promising directions where dissimilar artifacts can also be considered while searching for similar repositories motivating the need for further research.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信