2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
Characterizing the Roles of Contributors in Open-Source Scientific Software Projects 描述开源科学软件项目中贡献者的角色
Reed Milewicz, G. Pinto, Paige Rodeghero
{"title":"Characterizing the Roles of Contributors in Open-Source Scientific Software Projects","authors":"Reed Milewicz, G. Pinto, Paige Rodeghero","doi":"10.1109/MSR.2019.00069","DOIUrl":"https://doi.org/10.1109/MSR.2019.00069","url":null,"abstract":"The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g. graduate students) also contribute substantially — in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors’ behavior, while paving the road for future software engineering research.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"29 1","pages":"421-432"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73415259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction DeepJIT:用于即时缺陷预测的端到端深度学习框架
Thong Hoang, K. Dam, Yasutaka Kamei, D. Lo, Naoyasu Ubayashi
{"title":"DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction","authors":"Thong Hoang, K. Dam, Yasutaka Kamei, D. Lo, Naoyasu Ubayashi","doi":"10.1109/MSR.2019.00016","DOIUrl":"https://doi.org/10.1109/MSR.2019.00016","url":null,"abstract":"Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction – aka. Just-In-Time (JIT) defect prediction – has been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36-11.02% for the project QT and 9.51-13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC).","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"34-45"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79930073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
Message from the Chairs of MSR 2019 2019年MSR主席致辞
{"title":"Message from the Chairs of MSR 2019","authors":"","doi":"10.1109/msr.2019.00006","DOIUrl":"https://doi.org/10.1109/msr.2019.00006","url":null,"abstract":"","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90548913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tracing Back Log Data to its Log Statement: From Research to Practice 回溯日志数据到日志声明:从研究到实践
Daan Schipper, M. Aniche, A. Deursen
{"title":"Tracing Back Log Data to its Log Statement: From Research to Practice","authors":"Daan Schipper, M. Aniche, A. Deursen","doi":"10.1109/MSR.2019.00081","DOIUrl":"https://doi.org/10.1109/MSR.2019.00081","url":null,"abstract":"Logs are widely used as a source of information to understand the activity of computer systems and to monitor their health and stability. However, most log analysis techniques require the link between the log messages in the raw log file and the log statements in the source code that produce them. Several solutions have been proposed to solve this non-trivial challenge, of which the approach based on static analysis reaches the highest accuracy. We, at Adyen, implemented the state-of-the-art research on log parsing in our logging environment and evaluated their accuracy and performance. Our results show that, with some adaptation, the current static analysis techniques are highly efficient and performant. In other words, ready for use.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"233 1","pages":"545-549"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73488058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Negative Results on Mining Crypto-API Usage Rules in Android Apps 挖掘Android应用中加密api使用规则的负面结果
Jun Gao, Pingfan Kong, Li Li, Tegawendé F. Bissyandé, Jacques Klein
{"title":"Negative Results on Mining Crypto-API Usage Rules in Android Apps","authors":"Jun Gao, Pingfan Kong, Li Li, Tegawendé F. Bissyandé, Jacques Klein","doi":"10.1109/MSR.2019.00065","DOIUrl":"https://doi.org/10.1109/MSR.2019.00065","url":null,"abstract":"Android app developers recurrently use crypto-APIs to provide data security to app users. Unfortunately, misuse of APIs only creates an illusion of security and even exposes apps to systematic attacks. It is thus necessary to provide developers with a statically-enforceable list of specifications of crypto-API usage rules. On the one hand, such rules cannot be manually written as the process does not scale to all available APIs. On the other hand, a classical mining approach based on common usage patterns is not relevant in Android, given that a large share of usages include mistakes. In this work, building on the assumption that \"developers update API usage instances to fix misuses\", we propose to mine a large dataset of updates within about 40 000 real-world app lineages to infer API usage rules. Eventually, our investigations yield negative results on our assumption that API usage updates tend to correct misuses. Actually, it appears that updates that fix misuses may be unintentional: the same misuses patterns are quickly re-introduced by subsequent updates.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"34 1","pages":"388-398"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81289009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Cross-Language Clone Detection by Learning Over Abstract Syntax Trees 基于抽象语法树学习的跨语言克隆检测
Daniel Perez, S. Chiba
{"title":"Cross-Language Clone Detection by Learning Over Abstract Syntax Trees","authors":"Daniel Perez, S. Chiba","doi":"10.1109/MSR.2019.00078","DOIUrl":"https://doi.org/10.1109/MSR.2019.00078","url":null,"abstract":"Clone detection across programs written in the same programming language has been studied extensively in the literature. On the contrary, the task of detecting clones across multiple programming languages has not been studied as much, and approaches based on comparison cannot be directly applied. In this paper, we present a clone detection method based on semi-supervised machine learning designed to detect clones across programming languages with similar syntax. Our method uses an unsupervised learning approach to learn token-level vector representations and an LSTM-based neural network to predict whether two code fragments are clones. To train our network, we present a cross-language code clone dataset - which is to the best of our knowledge the first of its kind - containing around 45,000 code fragments written in Java and Python. We evaluate our approach on the dataset we created and show that our method gives promising results when detecting similarities between code fragments written in Java and Python.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"21 1","pages":"518-528"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79328748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
Generating Commit Messages from Diffs using Pointer-Generator Network 使用指针生成器网络从Diffs生成提交消息
Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, Yu Qian
{"title":"Generating Commit Messages from Diffs using Pointer-Generator Network","authors":"Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, Yu Qian","doi":"10.1109/MSR.2019.00056","DOIUrl":"https://doi.org/10.1109/MSR.2019.00056","url":null,"abstract":"The commit messages in source code repositories are valuable but not easy to be generated manually in time for tracking issues, reporting bugs, and understanding codes. Recently published works indicated that the deep neural machine translation approaches have drawn considerable attentions on automatic generation of commit messages. However, they could not deal with out-of-vocabulary (OOV) words, which are essential context-specific identifiers such as class names and method names in code diffs. In this paper, we propose PtrGNCMsg, a novel approach which is based on an improved sequence-to-sequence model with the pointer-generator network to translate code diffs into commit messages. By searching the smallest identifier set with the highest probability, PtrGNCMsg outperforms recent approaches based on neural machine translation, and first enables the prediction of OOV words. The experimental results based on the corpus of diffs and manual commit messages from the top 2,000 Java projects in GitHub show that PtrGNCMsg outperforms the state-of-the-art approach with improved BLEU by 1.02, ROUGE-1 by 4.00 and ROUGE-L by 3.78, respectively.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"41 1","pages":"299-309"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83652347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Semantic Source Code Models Using Identifier Embeddings 使用标识符嵌入的语义源代码模型
V. Efstathiou, D. Spinellis
{"title":"Semantic Source Code Models Using Identifier Embeddings","authors":"V. Efstathiou, D. Spinellis","doi":"10.1109/MSR.2019.00015","DOIUrl":"https://doi.org/10.1109/MSR.2019.00015","url":null,"abstract":"The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data–driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state–of–the–art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"95 1","pages":"29-33"},"PeriodicalIF":0.0,"publicationDate":"2019-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81807736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The Impact of Systematic Edits in History Slicing 系统编辑对历史切片的影响
Ryosuke Funaki, Shinpei Hayashi, M. Saeki
{"title":"The Impact of Systematic Edits in History Slicing","authors":"Ryosuke Funaki, Shinpei Hayashi, M. Saeki","doi":"10.1109/MSR.2019.00083","DOIUrl":"https://doi.org/10.1109/MSR.2019.00083","url":null,"abstract":"While extracting a subset of a commit history, specifying the necessary portion is a time-consuming task for developers. Several commit-based history slicing techniques have been proposed to identify dependencies between commits and to extract a related set of commits using a specific commit as a slicing criterion. However, the resulting subset of commits become large if commits for systematic edits whose changes do not depend on each other exist. We empirically investigated the impact of systematic edits on history slicing. In this study, commits in which systematic edits were detected are split between each file so that unnecessary dependencies between commits are eliminated. In several histories of open source systems, the size of history slices was reduced by 13.3-57.2% on average after splitting the commits for systematic edits.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"210 1","pages":"555-559"},"PeriodicalIF":0.0,"publicationDate":"2019-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76107279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Style-Analyzer: Fixing Code Style Inconsistencies with Interpretable Unsupervised Algorithms 风格分析器:用可解释的无监督算法修复代码风格不一致
Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, Egor Bulychev
{"title":"Style-Analyzer: Fixing Code Style Inconsistencies with Interpretable Unsupervised Algorithms","authors":"Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, Egor Bulychev","doi":"10.1109/MSR.2019.00073","DOIUrl":"https://doi.org/10.1109/MSR.2019.00073","url":null,"abstract":"Source code reviews are manual, time-consuming, and expensive. Human involvement should be focused on analyzing the most relevant aspects of the program, such as logic and maintainability, rather than amending style, syntax, or formatting defects. Some tools with linting capabilities can format code automatically and report various stylistic violations for supported programming languages. They are based on rules written by domain experts, hence, their configuration is often tedious, and it is impractical for the given set of rules to cover all possible corner cases. Some machine learning-based solutions exist, but they remain uninterpretable black boxes. This paper introduces style-analyzer, a new open source tool to automatically fix code formatting violations using the decision tree forest model which adapts to each codebase and is fully unsupervised. style-analyzer is built on top of our novel assisted code review framework, Lookout. It accurately mines the formatting style of each analyzed Git repository and expresses the found format patterns with compact human-readable rules. style-analyzer can then suggest style inconsistency fixes in the form of code review comments. We evaluate the output quality and practical relevance of style-analyzer by demonstrating that it can reproduce the original style with high precision, measured on 19 popular JavaScript projects, and by showing that it yields promising results in fixing real style mistakes. style-analyzer includes a web application to visualize how the rules are triggered. We release style-analyzer as a reusable and extendable open source software package on GitHub for the benefit of the community.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"87 1","pages":"468-478"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82059032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信