2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献_第7页

Characterizing the Roles of Contributors in Open-Source Scientific Software Projects 描述开源科学软件项目中贡献者的角色

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00069

Reed Milewicz, G. Pinto, Paige Rodeghero

{"title":"Characterizing the Roles of Contributors in Open-Source Scientific Software Projects","authors":"Reed Milewicz, G. Pinto, Paige Rodeghero","doi":"10.1109/MSR.2019.00069","DOIUrl":"https://doi.org/10.1109/MSR.2019.00069","url":null,"abstract":"The development of scientific software is, more than ever, critical to the practice of science, and this is accompanied by a trend towards more open and collaborative efforts. Unfortunately, there has been little investigation into who is driving the evolution of such scientific software or how the collaboration happens. In this paper, we address this problem. We present an extensive analysis of seven open-source scientific software projects in order to develop an empirically-informed model of the development process. This analysis was complemented by a survey of 72 scientific software developers. In the majority of the projects, we found senior research staff (e.g. professors) to be responsible for half or more of commits (an average commit share of 72%) and heavily involved in architectural concerns (seniors were more likely to interact with files related to the build system, project meta-data, and developer documentation). Juniors (e.g. graduate students) also contribute substantially — in one studied project, juniors made almost 100% of its commits. Still, graduate students had the longest contribution periods among juniors (with 1.72 years of commit activity compared to 0.98 years for postdocs and 4 months for undergraduates). Moreover, we also found that third-party contributors are scarce, contributing for just one day for the project. The results from this study aim to help scientists to better understand their own projects, communities, and the contributors’ behavior, while paving the road for future software engineering research.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"29 1","pages":"421-432"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73415259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Message from the Chairs of MSR 2019 2019年MSR主席致辞

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/msr.2019.00006

引用次数: 0

DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction DeepJIT:用于即时缺陷预测的端到端深度学习框架

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00016

Thong Hoang, K. Dam, Yasutaka Kamei, D. Lo, Naoyasu Ubayashi

{"title":"DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction","authors":"Thong Hoang, K. Dam, Yasutaka Kamei, D. Lo, Naoyasu Ubayashi","doi":"10.1109/MSR.2019.00016","DOIUrl":"https://doi.org/10.1109/MSR.2019.00016","url":null,"abstract":"Software quality assurance efforts often focus on identifying defective code. To find likely defective code early, change-level defect prediction – aka. Just-In-Time (JIT) defect prediction – has been proposed. JIT defect prediction models identify likely defective changes and they are trained using machine learning techniques with the assumption that historical changes are similar to future ones. Most existing JIT defect prediction approaches make use of manually engineered features. Unlike those approaches, in this paper, we propose an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects. Experiments on two popular software projects (i.e., QT and OPENSTACK) on three evaluation settings (i.e., cross-validation, short-period, and long-period) show that the best variant of DeepJIT (DeepJIT-Combined), compared with the best performing state-of-the-art approach, achieves improvements of 10.36-11.02% for the project QT and 9.51-13.69% for the project OPENSTACK in terms of the Area Under the Curve (AUC).","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"34-45"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79930073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 134

Tracing Back Log Data to its Log Statement: From Research to Practice 回溯日志数据到日志声明:从研究到实践

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00081

Daan Schipper, M. Aniche, A. Deursen

引用次数: 20

Negative Results on Mining Crypto-API Usage Rules in Android Apps 挖掘Android应用中加密api使用规则的负面结果

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00065

Jun Gao, Pingfan Kong, Li Li, Tegawendé F. Bissyandé, Jacques Klein

引用次数: 27

Cross-Language Clone Detection by Learning Over Abstract Syntax Trees 基于抽象语法树学习的跨语言克隆检测

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00078

Daniel Perez, S. Chiba

引用次数: 36

Generating Commit Messages from Diffs using Pointer-Generator Network 使用指针生成器网络从Diffs生成提交消息

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-01 DOI: 10.1109/MSR.2019.00056

Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, Yu Qian

{"title":"Generating Commit Messages from Diffs using Pointer-Generator Network","authors":"Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, Yu Qian","doi":"10.1109/MSR.2019.00056","DOIUrl":"https://doi.org/10.1109/MSR.2019.00056","url":null,"abstract":"The commit messages in source code repositories are valuable but not easy to be generated manually in time for tracking issues, reporting bugs, and understanding codes. Recently published works indicated that the deep neural machine translation approaches have drawn considerable attentions on automatic generation of commit messages. However, they could not deal with out-of-vocabulary (OOV) words, which are essential context-specific identifiers such as class names and method names in code diffs. In this paper, we propose PtrGNCMsg, a novel approach which is based on an improved sequence-to-sequence model with the pointer-generator network to translate code diffs into commit messages. By searching the smallest identifier set with the highest probability, PtrGNCMsg outperforms recent approaches based on neural machine translation, and first enables the prediction of OOV words. The experimental results based on the corpus of diffs and manual commit messages from the top 2,000 Java projects in GitHub show that PtrGNCMsg outperforms the state-of-the-art approach with improved BLEU by 1.02, ROUGE-1 by 4.00 and ROUGE-L by 3.78, respectively.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"41 1","pages":"299-309"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83652347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Semantic Source Code Models Using Identifier Embeddings 使用标识符嵌入的语义源代码模型

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-04-15 DOI: 10.1109/MSR.2019.00015

V. Efstathiou, D. Spinellis

{"title":"Semantic Source Code Models Using Identifier Embeddings","authors":"V. Efstathiou, D. Spinellis","doi":"10.1109/MSR.2019.00015","DOIUrl":"https://doi.org/10.1109/MSR.2019.00015","url":null,"abstract":"The emergence of online open source repositories in the recent years has led to an explosion in the volume of openly available source code, coupled with metadata that relate to a variety of software development activities. As an effect, in line with recent advances in machine learning research, software maintenance activities are switching from symbolic formal methods to data–driven methods. In this context, the rich semantics hidden in source code identifiers provide opportunities for building semantic representations of code which can assist tasks of code search and reuse. To this end, we deliver in the form of pretrained vector space models, distributed code representations for six popular programming languages, namely, Java, Python, PHP, C, C++, and C#. The models are produced using fastText, a state–of–the–art library for learning word representations. Each model is trained on data from a single programming language; the code mined for producing all models amounts to over 13.000 repositories. We indicate dissimilarities between natural language and source code, as well as variations in coding conventions in between the different programming languages we processed. We describe how these heterogeneities guided the data preprocessing decisions we took and the selection of the training parameters in the released models. Finally, we propose potential applications of the models and discuss limitations of the models.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"95 1","pages":"29-33"},"PeriodicalIF":0.0,"publicationDate":"2019-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81807736","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

The Impact of Systematic Edits in History Slicing 系统编辑对历史切片的影响

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-04-02 DOI: 10.1109/MSR.2019.00083

Ryosuke Funaki, Shinpei Hayashi, M. Saeki

引用次数: 3

Style-Analyzer: Fixing Code Style Inconsistencies with Interpretable Unsupervised Algorithms 风格分析器:用可解释的无监督算法修复代码风格不一致

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-04-01 DOI: 10.1109/MSR.2019.00073

Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, Egor Bulychev

{"title":"Style-Analyzer: Fixing Code Style Inconsistencies with Interpretable Unsupervised Algorithms","authors":"Vadim Markovtsev, Waren Long, Hugo Mougard, Konstantin Slavnov, Egor Bulychev","doi":"10.1109/MSR.2019.00073","DOIUrl":"https://doi.org/10.1109/MSR.2019.00073","url":null,"abstract":"Source code reviews are manual, time-consuming, and expensive. Human involvement should be focused on analyzing the most relevant aspects of the program, such as logic and maintainability, rather than amending style, syntax, or formatting defects. Some tools with linting capabilities can format code automatically and report various stylistic violations for supported programming languages. They are based on rules written by domain experts, hence, their configuration is often tedious, and it is impractical for the given set of rules to cover all possible corner cases. Some machine learning-based solutions exist, but they remain uninterpretable black boxes. This paper introduces style-analyzer, a new open source tool to automatically fix code formatting violations using the decision tree forest model which adapts to each codebase and is fully unsupervised. style-analyzer is built on top of our novel assisted code review framework, Lookout. It accurately mines the formatting style of each analyzed Git repository and expresses the found format patterns with compact human-readable rules. style-analyzer can then suggest style inconsistency fixes in the form of code review comments. We evaluate the output quality and practical relevance of style-analyzer by demonstrating that it can reproduce the original style with high precision, measured on 19 popular JavaScript projects, and by showing that it yields promising results in fixing real style mistakes. style-analyzer includes a web application to visualize how the rules are triggered. We release style-analyzer as a reusable and extendable open source software package on GitHub for the benefit of the community.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"87 1","pages":"468-478"},"PeriodicalIF":0.0,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82059032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9