2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
A Panel Data Set of Cryptocurrency Development Activity on GitHub GitHub上加密货币开发活动的面板数据集
R. V. Tonder, Asher Trockman, Claire Le Goues
{"title":"A Panel Data Set of Cryptocurrency Development Activity on GitHub","authors":"R. V. Tonder, Asher Trockman, Claire Le Goues","doi":"10.1109/MSR.2019.00037","DOIUrl":"https://doi.org/10.1109/MSR.2019.00037","url":null,"abstract":"Cryptocurrencies are a significant development in recent years, featuring in global news, the financial sector, and academic research. They also hold a significant presence in open source development, comprising some of the most popular repositories on GitHub. Their openly developed software artifacts thus present a unique and exclusive avenue to quantitatively observe human activity, effort, and software growth for cryptocurrencies. Our data set marks the first concentrated effort toward high-fidelity panel data of cryptocurrency development for a wide range of metrics. The data set is foremost a quantitative measure of developer activity for budding open source cryptocurrency development. We collect metrics like daily commits, contributors, lines of code changes, stars, forks, and subscribers. We also include financial data for each cryptocurrency: the daily price and market capitalization. The data set includes data for 236 cryptocurrencies for 380 days (roughly January 2018 to January 2019). We discuss particularly interesting research opportunities for this combination of data, and release new tooling to enable continuing data collection for future research opportunities as development and application of cryptocurrencies mature.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"90 1","pages":"186-190"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85911406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study 在Android中检测API兼容性问题的数据驱动解决方案:一个实证研究
Simone Scalabrino, G. Bavota, M. Linares-Vásquez, Michele Lanza, R. Oliveto
{"title":"Data-Driven Solutions to Detect API Compatibility Issues in Android: An Empirical Study","authors":"Simone Scalabrino, G. Bavota, M. Linares-Vásquez, Michele Lanza, R. Oliveto","doi":"10.1109/MSR.2019.00055","DOIUrl":"https://doi.org/10.1109/MSR.2019.00055","url":null,"abstract":"Android apps are inextricably linked to the official Android APIs. Such a strong form of dependency implies that changes introduced in new versions of the Android APIs can severely impact the apps' code, for example because of deprecated or removed APIs. In reaction to those changes, mobile app developers are expected to adapt their code and avoid compatibility issues. To support developers, approaches have been proposed to automatically identify API compatibility issues in Android apps. The state-of-the-art approach, named CiD, is a data-driven solution learning how to detect those issues by analyzing the changes in the history of Android APIs (\"API side\" learning). While it can successfully identify compatibility issues, it cannot recommend coding solutions. We devised an alternative data-driven approach, named ACRYL. ACRYL learns from changes implemented in other apps in response to API changes (\"client side\" learning). This allows not only to detect compatibility issues, but also to suggest a fix. When empirically comparing the two tools, we found that there is no clear winner, since the two approaches are highly complementary, in that they identify almost disjointed sets of API compatibility issues. Our results point to the future possibility of combining the two approaches, trying to learn detection/fixing rules on both the API and the client side.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"78 1","pages":"288-298"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83136920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata RmvDroid:迈向一个可靠的Android恶意软件数据集与应用元数据
Haoyu Wang, Junjun Si, Hao Li, Yao Guo
{"title":"RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata","authors":"Haoyu Wang, Junjun Si, Hao Li, Yao Guo","doi":"10.1109/MSR.2019.00067","DOIUrl":"https://doi.org/10.1109/MSR.2019.00067","url":null,"abstract":"A large number of research studies have been focused on detecting Android malware in recent years. As a result, a reliable and large-scale malware dataset is essential to build effective malware classifiers and evaluate the performance of different detection techniques. Although several Android malware benchmarks have been widely used in our research community, these benchmarks face several major limitations. First, most of the existing datasets are outdated and cannot reflect current malware evolution trends. Second, most of them only rely on VirusTotal to label the ground truth of malware, while some anti-virus engines on VirusTotal may not always report reliable results. Third, all of them only contain the apps themselves (apks), while other important app information (e.g., app description, user rating, and app installs) is missing, which greatly limits the usage scenarios of these datasets. In this paper, we have created a reliable Android malware dataset based on Google Play's app maintenance results over several years. We first created four snapshots of Google Play in 2014, 2015, 2017 and 2018 respectively. Then we use VirusTotal to label apps with possible sensitive behaviors, and monitor these apps on Google Play to see whether Google has removed them or not. Based on this approach, we have created a malware dataset containing 9,133 samples that belong to 56 malware families with high confidence. We believe this dataset will boost a series of research studies including Android malware detection and classification, mining apps for anomalies, and app store mining, etc.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"404-408"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81181524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study 在堆栈溢出问题中报告的问题可以重现吗?一项探索性研究
Saikat Mondal, M. M. Rahman, C. Roy
{"title":"Can Issues Reported at Stack Overflow Questions be Reproduced? An Exploratory Study","authors":"Saikat Mondal, M. M. Rahman, C. Roy","doi":"10.1109/MSR.2019.00074","DOIUrl":"https://doi.org/10.1109/MSR.2019.00074","url":null,"abstract":"Software developers often look for solutions to their code level problems at Stack Overflow. Hence, they frequently submit their questions with sample code segments and issue descriptions. Unfortunately, it is not always possible to reproduce their reported issues from such code segments. This phenomenon might prevent their questions from getting prompt and appropriate solutions. In this paper, we report an exploratory study on the reproducibility of the issues discussed in 400 questions of Stack Overflow. In particular, we parse, compile, execute and even carefully examine the code segments from these questions, spent a total of 200 man hours, and then attempt to reproduce their programming issues. The outcomes of our study are two-fold. First, we find that 68% of the code segments require minor and major modifications in order to reproduce the issues reported by the developers. On the contrary, 22% code segments completely fail to reproduce the issues. We also carefully investigate why these issues could not be reproduced and then provide evidence-based guidelines for writing effective code examples for Stack Overflow questions. Second, we investigate the correlation between issue reproducibility status (of questions) and corresponding answer meta-data such as the presence of an accepted answer. According to our analysis, a question with reproducible issues has at least three times higher chance of receiving an accepted answer than the question with irreproducible issues.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"6 1","pages":"479-489"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82777640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
What do Developers Know About Machine Learning: A Study of ML Discussions on StackOverflow 开发者对机器学习了解多少:StackOverflow上的ML讨论研究
A. A. Bangash, Hareem Sahar, S. Chowdhury, A. W. Wong, Abram Hindle, Karim Ali
{"title":"What do Developers Know About Machine Learning: A Study of ML Discussions on StackOverflow","authors":"A. A. Bangash, Hareem Sahar, S. Chowdhury, A. W. Wong, Abram Hindle, Karim Ali","doi":"10.1109/MSR.2019.00052","DOIUrl":"https://doi.org/10.1109/MSR.2019.00052","url":null,"abstract":"Machine learning, a branch of Artificial Intelligence, is now popular in software engineering community and is successfully used for problems like bug prediction, and software development effort estimation. Developers' understanding of machine learning, however, is not clear, and we require investigation to understand what educators should focus on, and how different online programming discussion communities can be more helpful. We conduct a study on Stack Overflow (SO) machine learning related posts using the SOTorrent dataset. We found that some machine learning topics are significantly more discussed than others, and others need more attention. We also found that topic generation with Latent Dirichlet Allocation (LDA) can suggest more appropriate tags that can make a machine learning post more visible and thus can help in receiving immediate feedback from sites like SO.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"24 1","pages":"260-264"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82548832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects? 开发者在他们的GitHub项目中引用StackOverflow帖子的频率和内容?
Saraj Singh Manes, Olga Baysal
{"title":"How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?","authors":"Saraj Singh Manes, Olga Baysal","doi":"10.1109/MSR.2019.00047","DOIUrl":"https://doi.org/10.1109/MSR.2019.00047","url":null,"abstract":"Stack Overflow (SO) is a popular Q&A forum for software developers, providing a large number of copyable code snippets. While GitHub is an independent code collaboration platform, developers often reuse SO code in their GitHub projects. In this paper, we investigate how often GitHub developers re-use code snippets from the SO forum, as well as what concepts they are more likely to reference in their code. To accomplish our goal, we mine SOTorrent dataset that provides connectivity between code snippets on the SO posts with software projects hosted on GitHub. We then study the characteristics of GitHub projects that reference SO posts and what popular SO discussions can be found in GitHub projects. Our results demonstrate that on average developers make 45 references to SO posts in their projects, with the highest number of references being made within the JavaScript code. We also found that 79% of the SO posts with code snippets that are referenced in GitHub code do change over time (at least ones) raising code maintainability and reliability concerns.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"39 1","pages":"235-239"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78726654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Can Duplicate Questions on Stack Overflow Benefit the Software Development Community? 关于堆栈溢出的重复问题能使软件开发社区受益吗?
Durham Abric, Oliver E. Clark, M. Caminiti, Keheliya Gallaba, Shane McIntosh
{"title":"Can Duplicate Questions on Stack Overflow Benefit the Software Development Community?","authors":"Durham Abric, Oliver E. Clark, M. Caminiti, Keheliya Gallaba, Shane McIntosh","doi":"10.1109/MSR.2019.00046","DOIUrl":"https://doi.org/10.1109/MSR.2019.00046","url":null,"abstract":"Duplicate questions on Stack Overflow are questions that are flagged as being conceptually equivalent to a previously posted question. Stack Overflow suggests that duplicate questions should not be discussed by users, but rather that attention should be redirected to their previously posted counterparts. Roughly 53% of closed Stack Overflow posts are closed due to duplication. Despite their supposed overlapping content, user activity suggests duplicates may generate additional or superior answers. Approximately 9% of duplicates receive more views than their original counterparts despite being closed. In this paper, we analyze duplicate questions from two perspectives. First, we analyze the experience of those who post duplicates using activity and reputation-based heuristics. Second, we compare the content of duplicates both in terms of their questions and answers to determine the degree of similarity between each duplicate pair. Through analysis of the MSR challenge dataset, we find that although duplicate questions are more likely to be created by inexperienced users, they often receive dissimilar answers to their original counterparts. Indeed, supplementary textual analysis using Natural Language Processing (NLP) techniques suggests duplicate questions provide additional information about the underlying concepts being discussed. We recommend that the Stack Overflow's duplication policy be revised to account for the benefits that leaving duplicate questions open may have for the developer community.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"230-234"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79115236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
An Empirical History of Permission Requests and Mistakes in Open Source Android Apps 开源Android应用中权限请求和错误的经验历史
Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz
{"title":"An Empirical History of Permission Requests and Mistakes in Open Source Android Apps","authors":"Gian Luca Scoccia, Anthony S Peruma, Virginia Pujols, Ben Christians, Daniel E. Krutz","doi":"10.1109/MSR.2019.00090","DOIUrl":"https://doi.org/10.1109/MSR.2019.00090","url":null,"abstract":"Android applications (apps) rely upon proper permission usage to ensure that the user's privacy and security are adequately protected. Unfortunately, developers frequently misuse app permissions in a variety of ways ranging from using too many permissions to not correctly adhering to Android's defined permission guidelines. The implications of these permissionissues (possible permission problems) can range from harming the user's perception of the app to significantly impacting their privacy and security. An imperative component to creating more secure apps that better protect a user's privacy is an improved understanding of how and when these issues are being introduced and repaired. While there are existing permissions-analysis tools and Android datasets, there are no available datasets that contain a large-scale empirical history of permission changes and mistakes. This limitation inhibits both developers and researchers from empirically studying and constructing a holistic understanding of permission-related issues. To address this shortfall with existing resources, we created a dataset of permission-based changes and permission-issues in open source Android apps. Our unique dataset contains information from 2,002 apps with commits from 10,601 unique committers, totaling 789,577 commits. We accomplished this by mining app repositories from F-Droid, extracting their version and commit histories, and analyzing this information using two permission analysis tools. Our work creates the foundation for future research in permission decisions and mistakes. Complete project details and data is available on our project website: https://mobilepermissions.github.io","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"363 1","pages":"597-601"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75413307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SeSaMe: A Data Set of Semantically Similar Java Methods SeSaMe:语义相似的Java方法的数据集
Marius Kamp, Patrick Kreutzer, M. Philippsen
{"title":"SeSaMe: A Data Set of Semantically Similar Java Methods","authors":"Marius Kamp, Patrick Kreutzer, M. Philippsen","doi":"10.1109/MSR.2019.00079","DOIUrl":"https://doi.org/10.1109/MSR.2019.00079","url":null,"abstract":"In the past, techniques for detecting similarly behaving code fragments were often only evaluated with small, artificial oracles or with code originating from programming competitions. Such code fragments differ largely from production codes. To enable more realistic evaluations, this paper presents SeSaMe, a data set of method pairs that are classified according to their semantic similarity. We applied text similarity measures on JavaDoc comments mined from 11 open source repositories and manually classified a selection of 857 pairs.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"110 1","pages":"529-533"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72849201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Data Set of Program Invariants and Error Paths 程序不变量和错误路径的数据集
Dirk Beyer
{"title":"A Data Set of Program Invariants and Error Paths","authors":"Dirk Beyer","doi":"10.1109/MSR.2019.00026","DOIUrl":"https://doi.org/10.1109/MSR.2019.00026","url":null,"abstract":"The analysis of correctness proofs and counterexamples of program source code is an important way to gain insights into methods that could make it easier in the future to find invariants to prove a program correct or to find bugs. The availability of high-quality data is often a limiting factor for researchers who want to study real program invariants and real bugs. The described data set provides a large collection of concrete verification results, which can be used in research projects as data source or for evaluation purposes. Each result is made available as verification witness, which represents either program invariants that were used to prove the program correct (correctness witness) or an error path to replay the actual bug (violation witness). The verification results are taken from actual verification runs on 10522 verification problems, using the 31 verification tools that participated in the 8th edition of the International Competition on Software Verification (SV-COMP). The collection contains a total of 125720 verification witnesses together with various meta data and a map to relate a witness to the C program that it originates from. Data set is available at: https://doi.org/10.5281/zenodo.2559175","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"12 1","pages":"111-115"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84982022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信