Processing Large Datasets of Fined Grained Source Code Changes

S. Levin, A. Yehudai
{"title":"Processing Large Datasets of Fined Grained Source Code Changes","authors":"S. Levin, A. Yehudai","doi":"10.1109/ICSME.2019.00064","DOIUrl":null,"url":null,"abstract":"In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.","PeriodicalId":106748,"journal":{"name":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME.2019.00064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the era of Big Code, when researchers seek to study an increasingly large number of repositories to support their findings, the data processing stage may require manipulating millions and more of records. In this work we focus on studies involving fine-grained AST level source code changes. We present how we extended the CodeDistillery source code mining framework with data manipulation capabilities, aimed to alleviate the processing of large datasets of fine grained source code changes. The capabilities we have introduced allow researchers to highly automate their repository mining process and streamline the data acquisition and processing phases. These capabilities have been successfully used to conduct a number of studies, in the course of which dozens of millions of fine-grained source code changes have been processed.
处理细粒度源代码更改的大型数据集
在大代码时代,当研究人员试图研究越来越多的存储库来支持他们的发现时,数据处理阶段可能需要操纵数百万甚至更多的记录。在这项工作中,我们专注于涉及细粒度AST级别源代码更改的研究。我们介绍了如何用数据操作功能扩展CodeDistillery源代码挖掘框架,旨在减轻处理细粒度源代码更改的大型数据集的工作量。我们介绍的功能允许研究人员高度自动化他们的存储库挖掘过程,并简化数据获取和处理阶段。这些功能已经被成功地用于进行大量的研究,在此过程中,已经处理了数千万个细粒度的源代码更改。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信