2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献_第2页

Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study 在堆栈溢出问题中报告的问题可以重现吗?一项探索性研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00074

Saikat Mondal, M. M. Rahman, C. Roy

{"title":"Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study","authors":"Saikat Mondal, M. M. Rahman, C. Roy","doi":"10.1109/MSR.2019.00074","DOIUrl":"https://doi.org/10.1109/MSR.2019.00074","url":null,"abstract":"Software developers often look for solutions to their code level problems at Stack Overflow. Hence, they frequently submit their questions with sample code segments and issue descriptions. Unfortunately, it is not always possible to reproduce their reported issues from such code segments. This phenomenon might prevent their questions from getting prompt and appropriate solutions. In this paper, we report an exploratory study on the reproducibility of the issues discussed in 400 questions of Stack Overflow. In particular, we parse, compile, execute and even carefully examine the code segments from these questions, spent a total of 200 man hours, and then attempt to reproduce their programming issues. The outcomes of our study are two-fold. First, we find that 68% of the code segments require minor and major modifications in order to reproduce the issues reported by the developers. On the contrary, 22% code segments completely fail to reproduce the issues. We also carefully investigate why these issues could not be reproduced and then provide evidence-based guidelines for writing effective code examples for Stack Overflow questions. Second, we investigate the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"479-489"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82777640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study 在Android中检测API兼容性问题的数据驱动解决方案:一个实证研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00055

Simone Scalabrino, G. Bavota, M. Linares-Vásquez, Michele Lanza, R. Oliveto

{"title":"Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study","authors":"Simone Scalabrino, G. Bavota, M. Linares-Vásquez, Michele Lanza, R. Oliveto","doi":"10.1109/MSR.2019.00055","DOIUrl":"https://doi.org/10.1109/MSR.2019.00055","url":null,"abstract":"Android apps are inextricably linked to the official Android APIs. Such a strong form of dependency implies that changes introduced in new versions of the Android APIs can severely impact the apps' code, for example because of deprecated or removed APIs. In reaction to those changes, mobile app developers are expected to adapt their code and avoid compatibility issues. To support developers, approaches have been proposed to automatically identify API compatibility issues in Android apps. The state-of-the-art approach, named CiD, is a data-driven solution learning how to detect those issues by analyzing the changes in the history of Android APIs (\"API side\" learning). While it can successfully identify compatibility issues, it cannot recommend coding solutions. We devised an alternative data-driven approach, named ACRYL. ACRYL learns from changes implemented in other apps in response to API changes (\"client side\" learning). This allows not only to detect compatibility issues, but also to suggest a fix. When empirically comparing the two tools, we found that there is no clear winner, since the two approaches are highly complementary, in that they identify almost disjointed sets of API compatibility issues. Our results point to the future possibility of combining the two approaches, trying to learn detection/fixing rules on both the API and the client side.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"78 1","pages":"288-298"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83136920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools Slack问答聊天作为软件工程工具挖掘源的探索性研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00075

Preetha Chatterjee, Kostadin Damevski, L. Pollock, Vinay Augustine, Nicholas A. Kraft

引用次数: 50

What do Developers Know About Machine Learning: A Study of ML Discussions on StackOverflow 开发者对机器学习了解多少:StackOverflow上的ML讨论研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00052

A. A. Bangash, Hareem Sahar, S. Chowdhury, A. W. Wong, Abram Hindle, Karim Ali

引用次数: 36

A Panel Data Set of Cryptocurrency Development Activity on GitHub GitHub上加密货币开发活动的面板数据集

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00037

R. V. Tonder, Asher Trockman, Claire Le Goues

{"title":"A Panel Data Set of Cryptocurrency Development Activity on GitHub","authors":"R. V. Tonder, Asher Trockman, Claire Le Goues","doi":"10.1109/MSR.2019.00037","DOIUrl":"https://doi.org/10.1109/MSR.2019.00037","url":null,"abstract":"Cryptocurrencies are a significant development in recent years, featuring in global news, the financial sector, and academic research. They also hold a significant presence in open source development, comprising some of the most popular repositories on GitHub. Their openly developed software artifacts thus present a unique and exclusive avenue to quantitatively observe human activity, effort, and software growth for cryptocurrencies. Our data set marks the first concentrated effort toward high-fidelity panel data of cryptocurrency development for a wide range of metrics. The data set is foremost a quantitative measure of developer activity for budding open source cryptocurrency development. We collect metrics like daily commits, contributors, lines of code changes, stars, forks, and subscribers. We also include financial data for each cryptocurrency: the daily price and market capitalization. The data set includes data for 236 cryptocurrencies for 380 days (roughly January 2018 to January 2019). We discuss particularly interesting research opportunities for this combination of data, and release new tooling to enable continuing data collection for future research opportunities as development and application of cryptocurrencies mature.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"90 1","pages":"186-190"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85911406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata RmvDroid:迈向一个可靠的Android恶意软件数据集与应用元数据

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00067

Haoyu Wang, Junjun Si, Hao Li, Yao Guo

{"title":"RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata","authors":"Haoyu Wang, Junjun Si, Hao Li, Yao Guo","doi":"10.1109/MSR.2019.00067","DOIUrl":"https://doi.org/10.1109/MSR.2019.00067","url":null,"abstract":"A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware classifiers and evaluate the performance of different detection techniques. Although several Android malware benchmarks have been widely used in our research community, these benchmarks face several major limitations. First, most of the existing datasets are outdated and cannot reflect current malware evolution trends. Second, most of them only rely on VirusTotal to label the ground truth of malware, while some anti-virus engines on VirusTotal may not always report reliable results. Third, all of them only contain the apps themselves (apks), while other important app information (e.g., app description, user rating, and app installs) is missing, which greatly limits the usage scenarios of these datasets. In this paper, we have created a reliable Android malware dataset based on Google Play's app maintenance results over several years. We first created four snapshots of Google Play in 2014, 2015, 2017 and 2018 respectively. Then we use VirusTotal to label apps with possible sensitive behaviors, and monitor these apps on Google Play to see whether Google has removed them or not. Based on this approach, we have created a malware dataset containing 9,133 samples that belong to 56 malware families with high confidence. We believe this dataset will boost a series of research studies including Android malware detection and classification, mining apps for anomalies, and app store mining, etc.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"404-408"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81181524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? 开发者在他们的GitHub项目中引用StackOverflow帖子的频率和内容?

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00047

Saraj Singh Manes, Olga Baysal

引用次数: 11

Can Duplicate Questions on Stack Overflow Benefit the Software Development Community? 关于堆栈溢出的重复问题能使软件开发社区受益吗?

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00046

Durham Abric, Oliver E. Clark, M. Caminiti, Keheliya Gallaba, Shane McIntosh

{"title":"Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?","authors":"Durham Abric, Oliver E. Clark, M. Caminiti, Keheliya Gallaba, Shane McIntosh","doi":"10.1109/MSR.2019.00046","DOIUrl":"https://doi.org/10.1109/MSR.2019.00046","url":null,"abstract":"Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed. In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow's duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"230-234"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79115236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

An Empirical History of Permission Requests and Mistakes in Open Source Android Apps 开源Android应用中权限请求和错误的经验历史

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00090

Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz

{"title":"An Empirical History of Permission Requests and Mistakes in Open Source Android Apps","authors":"Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz","doi":"10.1109/MSR.2019.00090","DOIUrl":"https://doi.org/10.1109/MSR.2019.00090","url":null,"abstract":"Android applications (apps) rely upon proper permission usage to ensure that the user's privacy and security are adequately protected. Unfortunately, developers frequently misuse app permissions in a variety of ways ranging from using too many permissions to not correctly adhering to Android's defined permission guidelines. The implications of these permissionissues (possible permission problems) can range from harming the user's perception of the app to significantly impacting their privacy and security. An imperative component to creating more secure apps that better protect a user's privacy is an improved understanding of how and when these issues are being introduced and repaired. While there are existing permissions-analysis tools and Android datasets, there are no available datasets that contain a large-scale empirical history of permission changes and mistakes. This limitation inhibits both developers and researchers from empirically studying and constructing a holistic understanding of permission-related issues. To address this shortfall with existing resources, we created a dataset of permission-based changes and permission-issues in open source Android apps. Our unique dataset contains information from 2,002 apps with commits from 10,601 unique committers, totaling 789,577 commits. We accomplished this by mining app repositories from F-Droid, extracting their version and commit histories, and analyzing this information using two permission analysis tools. Our work creates the foundation for future research in permission decisions and mistakes. Complete project details and data is available on our project website: https://mobilepermissions.github.io","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"363 1","pages":"597-601"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75413307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A Data Set of Program Invariants and Error Paths 程序不变量和错误路径的数据集

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00026

Dirk Beyer

{"title":"A Data Set of Program Invariants and Error Paths","authors":"Dirk Beyer","doi":"10.1109/MSR.2019.00026","DOIUrl":"https://doi.org/10.1109/MSR.2019.00026","url":null,"abstract":"The analysis of correctness proofs and counterexamples of program source code is an important way to gain insights into methods that could make it easier in the future to find invariants to prove a program correct or to find bugs. The availability of high-quality data is often a limiting factor for researchers who want to study real program invariants and real bugs. The described data set provides a large collection of concrete verification results, which can be used in research projects as data source or for evaluation purposes. Each result is made available as verification witness, which represents either program invariants that were used to prove the program correct (correctness witness) or an error path to replay the actual bug (violation witness). The verification results are taken from actual verification runs on 10522 verification problems, using the 31 verification tools that participated in the 8th edition of the International Competition on Software Verification (SV-COMP). The collection contains a total of 125720 verification witnesses together with various meta data and a map to relate a witness to the C program that it originates from. Data set is available at: https://doi.org/10.5281/zenodo.2559175","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"12 1","pages":"111-115"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84982022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1