2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
Python Coding Style Compliance on Stack Overflow Python代码风格遵从堆栈溢出
Nikolaos Bafatakis, Niels Boecker, Wenjie. Boon, Martin Cabello Salazar, J. Krinke, Gazi Oznacar, Robert White
{"title":"Python Coding Style Compliance on Stack Overflow","authors":"Nikolaos Bafatakis, Niels Boecker, Wenjie. Boon, Martin Cabello Salazar, J. Krinke, Gazi Oznacar, Robert White","doi":"10.1109/MSR.2019.00042","DOIUrl":"https://doi.org/10.1109/MSR.2019.00042","url":null,"abstract":"Software developers all over the world use Stack Overflow (SO) to interact and exchange code snippets. Research also uses SO to harvest code snippets for use with recommendation systems. However, previous work has shown that code on SO may have quality issues, such as security or license problems. We analyse Python code on SO to determine its coding style compliance. From 1,962,535 code snippets tagged with 'python', we extracted 407,097 snippets of at least 6 statements of Python code. Surprisingly, 93.87% of the extracted snippets contain style violations, with an average of 0.7 violations per statement and a huge number of snippets with a considerably higher ratio. Researchers and developers should, therefore, be aware that code snippets on SO may not representative of good coding style. Furthermore, while user reputation seems to be unrelated to coding style compliance, for posts with vote scores in the range between -10 and 20, we found a strong correlation (r = -0.87, p < 10^-7) between the vote score a post received and the average number of violations per statement for snippets in such posts.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"52 1","pages":"210-214"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78098230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Empirical Study in using Version Histories for Change Risk Classification 版本历史用于变更风险分类的实证研究
Max Kiehn, Xiangyi Pan, F. Camci
{"title":"Empirical Study in using Version Histories for Change Risk Classification","authors":"Max Kiehn, Xiangyi Pan, F. Camci","doi":"10.1109/MSR.2019.00018","DOIUrl":"https://doi.org/10.1109/MSR.2019.00018","url":null,"abstract":"Many techniques have been proposed for mining software repositories, predicting code quality and evaluating code changes. Prior work has established links between code ownership and churn metrics, and software quality at file and directory level based on changes that fix bugs. Other metrics have been used to evaluate individual code changes based on preceding changes that induce fixes. This paper combines the two approaches in an empirical study of assessing risk of code changes using established code ownership and churn metrics with fix inducing changes on a large proprietary code repository. We establish a machine learning model for change risk classification which achieves average precision of 0.76 using metrics from prior works and 0.90 using a wider array of metrics. Our results suggest that code ownership metrics can be applied in change risk classification models based on fix inducing changes.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"184 1","pages":"58-62"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86086960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Does UML Modeling Associate with Lower Defect Proneness?: A Preliminary Empirical Investigation UML建模是否与较低的缺陷倾向相关联?:初步实证调查
Adithya Raghuraman, Truong Ho-Quang, M. Chaudron, A. Serebrenik, Bogdan Vasilescu
{"title":"Does UML Modeling Associate with Lower Defect Proneness?: A Preliminary Empirical Investigation","authors":"Adithya Raghuraman, Truong Ho-Quang, M. Chaudron, A. Serebrenik, Bogdan Vasilescu","doi":"10.1109/MSR.2019.00024","DOIUrl":"https://doi.org/10.1109/MSR.2019.00024","url":null,"abstract":"The benefits of modeling the design to improve the quality and maintainability of software systems have long been advocated and recognized. Yet, the empirical evidence on this remains scarce. In this paper, we fill this gap by reporting on an empirical study of the relationship between UML modeling and software defect proneness in a large sample of open-source GitHub projects. Using statistical modeling, and controlling for confounding variables, we show that projects containing traces of UML models in their repositories experience, on average, a statistically minorly different number of software defects (as mined from their issue trackers) than projects without traces of UML models.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"58 1","pages":"101-104"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80461443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
What Edits are Done on the Highly Answered Questions in Stack Overflow? An Empirical Study 对堆栈溢出中高度回答的问题进行了哪些编辑?实证研究
Xianhao Jin, Francisco Servant
{"title":"What Edits are Done on the Highly Answered Questions in Stack Overflow? An Empirical Study","authors":"Xianhao Jin, Francisco Servant","doi":"10.1109/MSR.2019.00045","DOIUrl":"https://doi.org/10.1109/MSR.2019.00045","url":null,"abstract":"Stack Overflow is the most-widely-used online question-and-answer platform for software developers to solve problems and communicate experience. Stack Overflow believes in the power of community editing, which means that one is able to edit questions without the changes going through peer review. Stack Overflow users may make edits to questions for a variety of reasons, among others, to improve the question and try to obtain more answers. However, to date the relationship between edit actions on questions and the number of answers that they collect is unknown. In this paper, we perform an empirical study on Stack Overflow to understand the relationship between edit actions and number of answers obtained in different dimensions from different attributes of the edited questions. We find that questions are more commonly edited by question owners, on bodies with relatively big changes before obtaining an accepted answer. However, edited questions that obtained more answers in a shorter time, were edited by other users rather than question owners, and their edits tended to be small, focused on titles and in adding addendums.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"31 4 1","pages":"225-229"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77253625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Standing on Shoulders or Feet? The Usage of the MSR Data Papers 站在肩膀上还是脚上?MSR数据文件的使用
Zoe Kotti, D. Spinellis
{"title":"Standing on Shoulders or Feet? The Usage of the MSR Data Papers","authors":"Zoe Kotti, D. Spinellis","doi":"10.1109/MSR.2019.00085","DOIUrl":"https://doi.org/10.1109/MSR.2019.00085","url":null,"abstract":"Introduction: The establishment of the Mining Software Repositories (MSR) Data Showcase conference track has encouraged researchers to provide more data sets as a basis for further empirical studies. Objectives: Examine the usage of the data papers published in the MSR proceedings in terms of use frequency, users, and use purpose. Methods: Data track papers were collected from the MSR Data Showcase and through the manual inspection of older MSR proceedings. The use of data papers was established through citation searching followed by reading the studies that have cited them. Data papers were then clustered based on their content, whereas their citations were classified according to the knowledge areas of the Guide to the Software Engineering Body of Knowledge. Results: We found that 65% of the data papers have been used in other studies, with a long-tail distribution in the number of citations. MSR data papers are cited less than other MSR papers. A considerable number of the citations stem from the teams that authored the data papers. Publications providing repository data and metadata are the most frequent data papers and the most often cited ones. Mobile application data papers are the least common ones, but the second most frequently cited. Conclusion: Data papers have provided the foundation for a significant number of studies, but there is room for improvement in their utilization. This can be done by setting a higher bar for their publication, by encouraging their use, and by providing incentives for the enrichment of existing data collections.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"39 1","pages":"565-576"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88441585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Scalable Software Merging Studies with MERGANSER 可扩展的软件合并研究与MERGANSER
Moein Owhadi-Kareshk, Sarah Nadi
{"title":"Scalable Software Merging Studies with MERGANSER","authors":"Moein Owhadi-Kareshk, Sarah Nadi","doi":"10.1109/MSR.2019.00084","DOIUrl":"https://doi.org/10.1109/MSR.2019.00084","url":null,"abstract":"Software merging researchers constantly need empirical data of real-world merge scenarios to analyze. Such data is currently extracted through individual and isolated efforts, often with non-systematically designed scripts that may not easily scale to large studies. This hinders replication and proper comparison of results. In this paper, we introduce MERGANSER, a scalable and easy-to-use tool for extracting and analyzing merge scenarios in Git repositories. In addition to extracting basic information about merge scenarios from Git history, our tool also replays each merge to detect conflicts and stores the corresponding information of conflicting files and regions. We design a normalized and extensible SQL data schema to store the information of the analyzed repositories, merge scenarios and involved commits, and merge replays and conflicts. By running only one command, our proposed tool clones the target repositories, detects their merge scenarios, and stores their information in a SQL database. MERGANSER is written in Python and released under the MIT license. In this tool paper, we describe MERGANSER's architecture and provide guidance for its usage in practice.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"81 1","pages":"560-564"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80973613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Dataset of Parametric Cryptographic Misuses 参数密码误用数据集
A. Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, M. Mezini
{"title":"A Dataset of Parametric Cryptographic Misuses","authors":"A. Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, M. Mezini","doi":"10.1109/MSR.2019.00023","DOIUrl":"https://doi.org/10.1109/MSR.2019.00023","url":null,"abstract":"Cryptographic APIs (Crypto APIs) provide the foundations for the development of secure applications. Unfortunately, most applications do not use Crypto APIs securely and end up being insecure, e.g., by the usage of an outdated algorithm, a constant initialization vector, or an inappropriate hashing algorithm. Two different studies [1], [2] have recently shown that 88% to 95% of those applications using Crypto APIs are insecure due to misuses. To facilitate further research on these kinds of misuses, we created a collection of 201 misuses found in real-world applications along with a classification of those misuses. In the provided dataset, each misuse consists of the corresponding open-source project, the project's build information, a description of the misuse, and the misuse's location. Further, we integrated our dataset into MUBench [3], a benchmark for API misuse detection. Our dataset provides a foundation for research on Crypto API misuses. For example, it can be used to evaluate the precision and recall of detection tools, as a foundation for studies related to Crypto API misuses, or as a training set.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"9 1","pages":"96-100"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85300880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Dependency Versioning in the Wild 野外的依赖版本控制
Jens Dietrich, David J. Pearce, Jacob Stringer, Amjed Tahir, Kelly Blincoe
{"title":"Dependency Versioning in the Wild","authors":"Jens Dietrich, David J. Pearce, Jacob Stringer, Amjed Tahir, Kelly Blincoe","doi":"10.1109/MSR.2019.00061","DOIUrl":"https://doi.org/10.1109/MSR.2019.00061","url":null,"abstract":"Many modern software systems are built on top of existing packages (modules, components, libraries). The increasing number and complexity of dependencies has given rise to automated dependency management where package managers resolve symbolic dependencies against a central repository. When declaring dependencies, developers face various choices, such as whether or not to declare a fixed version or a range of versions. The former results in runtime behaviour that is easier to predict, whilst the latter enables flexibility in resolution that can, for example, prevent different versions of the same package being included and facilitates the automated deployment of bug fixes. We study the choices developers make across 17 different package managers, investigating over 70 million dependencies. This is complemented by a survey of 170 developers. We find that many package managers support — and the respective community adapts — flexible versioning practices. This does not always work: developers struggle to find the sweet spot between the predictability of fixed version dependencies, and the agility of flexible ones, and depending on their experience, adjust practices. We see some uptake of semantic versioning in some package managers, supported by tools. However, there is no evidence that projects switch to semantic versioning on a large scale. The results of this study can guide further research into better practices for automated dependency management, and aid the adaptation of semantic versioning.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"44 1","pages":"349-359"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76464029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Time Present and Time Past: Analyzing the Evolution of JavaScript Code in the Wild 现在的时间和过去的时间:分析JavaScript代码在野外的演变
Dimitris Mitropoulos, P. Louridas, Vitalis Salis, D. Spinellis
{"title":"Time Present and Time Past: Analyzing the Evolution of JavaScript Code in the Wild","authors":"Dimitris Mitropoulos, P. Louridas, Vitalis Salis, D. Spinellis","doi":"10.1109/MSR.2019.00029","DOIUrl":"https://doi.org/10.1109/MSR.2019.00029","url":null,"abstract":"JavaScript is one of the web's key building blocks. It is used by the majority of web sites and it is supported by all modern browsers. We present the first large-scale study of client-side JavaScript code over time. Specifically, we have collected and analyzed a dataset containing daily snapshots of JavaScript code coming from Alexa's Top 10000 web sites (~7.5 GB per day) for nine consecutive months, to study different temporal aspects of web client code. We found that scripts change often; typically every few days, indicating a rapid pace in web applications development. We also found that the lifetime of web sites themselves, measured as the time between JavaScript changes, is also short, in the same time scale. We then performed a qualitative analysis to investigate the nature of the changes that take place. We found that apart from standard changes such as the introduction of new functions, many changes are related to online configuration management. In addition, we examined JavaScript code reuse over time and especially the widespread reliance on third-party libraries. Furthermore, we observed how quality issues evolve by employing established static analysis tools to identify potential software bugs, whose evolution we tracked over time. Our results show that quality issues seem to persist over time, while vulnerable libraries tend to decrease.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"74 1","pages":"126-137"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88040738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data 代码世界:用于挖掘开源VCS数据世界的基础设施
Yuxing Ma, Chris Bogart, Sadika Amreen, R. Zaretzki, A. Mockus
{"title":"World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data","authors":"Yuxing Ma, Chris Bogart, Sadika Amreen, R. Zaretzki, A. Mockus","doi":"10.1109/MSR.2019.00031","DOIUrl":"https://doi.org/10.1109/MSR.2019.00031","url":null,"abstract":"Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are tens of millions of projects in the periphery interconnected through technical dependencies, code sharing, or knowledge flows? To answer such questions we a) create a very large and frequently updated collection of version control data for FLOSS projects named World of Code (WoC) and b) provide basic tools for conducting research that depends on measuring interdependencies among all FLOSS projects. Our current WoC implementation is capable of being updated on a monthly basis and contains over 12B git objects. To evaluate its research potential and to create vignettes for its usage, we employ WoC in conducting several research tasks. In particular, we find that it is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage. We expect WoC to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem. Our infrastructure facilitates the discovery of key technical dependencies, code flow, and social networks that provide the basis to determine the structure and evolution of the relationships that drive FLOSS activities and innovation.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"143-154"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87763930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信