MapReduce作为支持挖掘软件存储库(MSR)研究的通用框架

Weiyi Shang, Z. Jiang, Bram Adams, A. Hassan
{"title":"MapReduce作为支持挖掘软件存储库(MSR)研究的通用框架","authors":"Weiyi Shang, Z. Jiang, Bram Adams, A. Hassan","doi":"10.1109/MSR.2009.5069477","DOIUrl":null,"url":null,"abstract":"Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.","PeriodicalId":413721,"journal":{"name":"2009 6th IEEE International Working Conference on Mining Software Repositories","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":"{\"title\":\"MapReduce as a general framework to support research in Mining Software Repositories (MSR)\",\"authors\":\"Weiyi Shang, Z. Jiang, Bram Adams, A. Hassan\",\"doi\":\"10.1109/MSR.2009.5069477\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.\",\"PeriodicalId\":413721,\"journal\":{\"name\":\"2009 6th IEEE International Working Conference on Mining Software Repositories\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"53\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 6th IEEE International Working Conference on Mining Software Repositories\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR.2009.5069477\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 6th IEEE International Working Conference on Mining Software Repositories","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2009.5069477","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 53

摘要

研究人员继续展示挖掘软件存储库(MSR)支持软件开发和研究活动的好处。然而,由于挖掘过程是时间和资源密集型的,他们经常创建自己的分布式平台,并使用各种优化来加速和扩展他们的分析。这些平台是特定于项目的,难以重用,并且只提供很少的调试和部署支持。在本文中,我们提出使用分布式计算平台MapReduce来支持MSR的研究。作为概念验证,我们将J-REX(一个优化的进化代码提取器)迁移到Hadoop (MapReduce的开源实现)上运行。通过对Eclipse、BIRT和Datatools项目的源代码控制存储库的案例研究,我们证明了迁移到MapReduce的工作量是最小的,而且好处是显著的,因为迁移后的J-REX的运行时间仅为原始J-REX的30%到50%。本文记录了我们在迁移方面的经验,并强调了MapReduce框架在MSR社区中的好处和挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
MapReduce as a general framework to support research in Mining Software Repositories (MSR)
Researchers continue to demonstrate the benefits of Mining Software Repositories (MSR) for supporting software development and research activities. However, as the mining process is time and resource intensive, they often create their own distributed platforms and use various optimizations to speed up and scale up their analysis. These platforms are project-specific, hard to reuse, and offer minimal debugging and deployment support. In this paper, we propose the use of MapReduce, a distributed computing platform, to support research in MSR. As a proof-of-concept, we migrate J-REX, an optimized evolutionary code extractor, to run on Hadoop, an open source implementation of MapReduce. Through a case study on the source control repositories of the Eclipse, BIRT and Datatools projects, we demonstrate that the migration effort to MapReduce is minimal and that the benefits are significant, as running time of the migrated J-REX is only 30% to 50% of the original J-REX's. This paper documents our experience with the migration, and highlights the benefits and challenges of the MapReduce framework in the MSR community.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信