2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow 基于答案编辑挖掘的堆栈溢出演化模式研究
Themistoklis G. Diamantopoulos, Maria-Ioanna Sifaki, A. Symeonidis
{"title":"Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow","authors":"Themistoklis G. Diamantopoulos, Maria-Ioanna Sifaki, A. Symeonidis","doi":"10.1109/MSR.2019.00043","DOIUrl":"https://doi.org/10.1109/MSR.2019.00043","url":null,"abstract":"The current state of practice dictates that in order to solve a problem encountered when building software, developers ask for help in online platforms, such as Stack Overflow. In this context of collaboration, answers to question posts often undergo several edits to provide the best solution to the problem stated. In this work, we explore the potential of mining Stack Overflow answer edits to extract common patterns when answering a post. In particular, we design a similarity scheme that takes into account the text and code of answer edits and cluster edits according to their semantics. Upon applying our methodology, we provide frequent edit patterns and indicate how they could be used to answer future research questions. Assessing our approach indicates that it can be effective for identifying commonly applied edits, thus illustrating the transformation path from the initial answer to the optimal solution.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"15 1","pages":"215-219"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78427012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Boa Meets Python: A Boa Dataset of Data Science Software in Python Language 蟒蛇遇见Python: Python语言数据科学软件的蟒蛇数据集
Sumon Biswas, Md Johirul Islam, Yijia Huang, Hridesh Rajan
{"title":"Boa Meets Python: A Boa Dataset of Data Science Software in Python Language","authors":"Sumon Biswas, Md Johirul Islam, Yijia Huang, Hridesh Rajan","doi":"10.1109/MSR.2019.00086","DOIUrl":"https://doi.org/10.1109/MSR.2019.00086","url":null,"abstract":"The popularity of Python programming language has surged in recent years due to its increasing usage in Data Science. The availability of Python repositories in Github presents an opportunity for mining software repository research, e.g., suggesting the best practices in developing Data Science applications, identifying bug-patterns, recommending code enhancements, etc. To enable this research, we have created a new dataset that includes 1,558 mature Github projects that develop Python software for Data Science tasks. By analyzing the metadata and code, we have included the projects in our dataset which use a diverse set of machine learning libraries and managed by a variety of users and organizations. The dataset is made publicly available through Boa infrastructure both as a collection of raw projects as well as in a processed form that could be used for performing large scale analysis using Boa language. We also present two initial applications to demonstrate the potential of the dataset that could be leveraged by the community.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"35 1","pages":"577-581"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77777817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice 应用深度树模型进行软件缺陷预测的经验教训
K. Dam, Trang Pham, S. W. Ng, T. Tran, J. Grundy, A. Ghose, Taeksu Kim, Chul-Joo Kim
{"title":"Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice","authors":"K. Dam, Trang Pham, S. W. Ng, T. Tran, J. Grundy, A. Ghose, Taeksu Kim, Chul-Joo Kim","doi":"10.1109/MSR.2019.00017","DOIUrl":"https://doi.org/10.1109/MSR.2019.00017","url":null,"abstract":"Defects are common in software systems and cause many problems for software users. Different methods have been developed to make early prediction about the most likely defective modules in large codebases. Most focus on designing features (e.g. complexity metrics) that correlate with potentially defective code. Those approaches however do not sufficiently capture the syntax and multiple levels of semantics of source code, a potentially important capability for building accurate prediction models. In this paper, we report on our experience of deploying a new deep learning tree-based defect prediction model in practice. This model is built upon the tree-structured Long Short Term Memory network which directly matches with the Abstract Syntax Tree representation of source code. We discuss a number of lessons learned from developing the model and evaluating it on two datasets, one from open source projects contributed by our industry partner Samsung and the other from the public PROMISE repository.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"4 1","pages":"46-57"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73289219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts 探索词嵌入技术改进软件工程文本情感分析
Eeshita Biswas, K. Vijay-Shanker, L. Pollock
{"title":"Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts","authors":"Eeshita Biswas, K. Vijay-Shanker, L. Pollock","doi":"10.1109/MSR.2019.00020","DOIUrl":"https://doi.org/10.1109/MSR.2019.00020","url":null,"abstract":"Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"68-78"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90842152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
The Software Heritage Graph Dataset: Public Software Development Under One Roof 软件遗产图数据集:同一屋檐下的公共软件开发
Antoine Pietri, D. Spinellis, Stefano Zacchiroli
{"title":"The Software Heritage Graph Dataset: Public Software Development Under One Roof","authors":"Antoine Pietri, D. Spinellis, Stefano Zacchiroli","doi":"10.1109/MSR.2019.00030","DOIUrl":"https://doi.org/10.1109/MSR.2019.00030","url":null,"abstract":"Software Heritage is the largest existing public archive of software source code and accompanying development history: it currently spans more than five billion unique source code files and one billion unique commits, coming from more than 80 million software projects. This paper introduces the Software Heritage graph dataset: a fully-deduplicated Merkle DAG representation of the Software Heritage archive. The dataset links together file content identifiers, source code directories, Version Control System (VCS) commits tracking evolution over time, up to the full states of VCS repositories as observed by Software Heritage during periodic crawls. The dataset's contents come from major development forges (including GitHub and GitLab), FOSS distributions (e.g., Debian), and language-specific package managers (e.g., PyPI). Crawling information is also included, providing timestamps about when and where all archived source code artifacts have been observed in the wild. The Software Heritage graph dataset is available in multiple formats, including downloadable CSV dumps and Apache Parquet files for local use, as well as a public instance on Amazon Athena interactive query service for ready-to-use powerful analytical processing. Source code file contents are cross-referenced at the graph leaves, and can be retrieved through individual requests using the Software Heritage archive API.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"70 Suppl4 1","pages":"138-142"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75778636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
RapidRelease - A Dataset of Projects and Issues on Github with Rapid Releases RapidRelease - Github上的项目和问题数据集,具有快速发布功能
Saket Joshi, S. Chimalakonda
{"title":"RapidRelease - A Dataset of Projects and Issues on Github with Rapid Releases","authors":"Saket Joshi, S. Chimalakonda","doi":"10.1109/MSR.2019.00088","DOIUrl":"https://doi.org/10.1109/MSR.2019.00088","url":null,"abstract":"In the recent years, there has been a surge in the adoption of agile development model and continuous integration (CI) in software development. Recent trends have reduced average release cycle lengths to as low as 1-2 weeks, leading to an extensive number of studies in release engineering. Open-source development (OSD) has also witnessed a rapid increase in release rates, however, no large dataset of open-source projects exists which features high release rates. In this paper, we introduce the RapidRelease dataset, a data showcase of high release frequency open-source projects. The dataset hosts 994 projects from Github, with over 2 million issue reports. To the best of our knowledge, this is the first dataset that can facilitate researchers to empirically study release engineering and agile software development in open-source projects with rapid releases.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"47 1","pages":"587-591"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84348516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Impacts of Daylight Saving Time on Software Development 日光节约时间对软件开发的影响
J. Hayashi, Yoshiki Higo, S. Matsumoto, S. Kusumoto
{"title":"Impacts of Daylight Saving Time on Software Development","authors":"J. Hayashi, Yoshiki Higo, S. Matsumoto, S. Kusumoto","doi":"10.1109/MSR.2019.00076","DOIUrl":"https://doi.org/10.1109/MSR.2019.00076","url":null,"abstract":"Daylight saving time (DST) is observed in many countries and regions. DST is not considered on some software systems at the beginning of their developments, for example, software systems developed in regions where DST is not observed. However, such systems may have to consider DST at the requests of their users. Before now, there has been no study about the impacts of DST on software development. In this paper, we study the impacts of DST on software development by mining the repositories on GitHub. We analyze the date when the code related to DST is changed, and we analyze the regions where the developers applied the changes live. Furthermore, we classify the changes into some patterns.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"4 1","pages":"502-506"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90499237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Test Coverage in Python Programs Python程序中的测试覆盖率
Hongyu Zhai, Casey Casalnuovo, Premkumar T. Devanbu
{"title":"Test Coverage in Python Programs","authors":"Hongyu Zhai, Casey Casalnuovo, Premkumar T. Devanbu","doi":"10.1109/MSR.2019.00027","DOIUrl":"https://doi.org/10.1109/MSR.2019.00027","url":null,"abstract":"We study code coverage in several popular Python projects: flask, matplotlib, pandas, scikit-learn, and scrapy. Coverage data on these projects is gathered and hosted on the Codecov website, from where this data can be mined. Using this data, and a syntactic parse of the code, we examine the effect of control flow structure, statement type (e.g., if, for) and code age on test coverage. We find that coverage depends on control flow structure, with more deeply nested statements being significantly less likely to be covered. This is a clear effect, which holds up in every project, even when controlling for the age of the line (as determined by git blame). We find that the age of a line per se has a small (but statistically significant) positive effect on coverage. Finally, we find that the kind of statement (try, if, except, raise, etc) has varying effects on coverage, with exception-handling statements being covered much less often. These results suggest that developers in Python projects have difficulty writing test sets that cover deeply-nested and error-handling statements, and might need assistance covering such code.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"51 1","pages":"116-120"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89160984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials 描述堆栈溢出和教程之间的重复代码片段
Manziba Akanda Nishi, Agnieszka Ciborowska, Kostadin Damevski
{"title":"Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials","authors":"Manziba Akanda Nishi, Agnieszka Ciborowska, Kostadin Damevski","doi":"10.1109/MSR.2019.00048","DOIUrl":"https://doi.org/10.1109/MSR.2019.00048","url":null,"abstract":"Developers are usually unaware of the quality and lineage of information available on popular Web resources, leading to potential maintenance problems and license violations when reusing code snippets from these resources. In this paper, we study the duplication of code snippets between two popular sources of software development information: the Stack Overflow Q a significant number (31%) of answers that contained a duplicate code block were chosen as the accepted answer. Qualitative analysis reveals that developers commonly use Stack Overflow to ask clarifying questions about code they reused from tutorials, and copy code snippets from tutorials to provide answers to questions.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"43 1","pages":"240-244"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77331966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools Slack问答聊天作为软件工程工具挖掘源的探索性研究
Preetha Chatterjee, Kostadin Damevski, L. Pollock, Vinay Augustine, Nicholas A. Kraft
{"title":"Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools","authors":"Preetha Chatterjee, Kostadin Damevski, L. Pollock, Vinay Augustine, Nicholas A. Kraft","doi":"10.1109/MSR.2019.00075","DOIUrl":"https://doi.org/10.1109/MSR.2019.00075","url":null,"abstract":"Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential use-fulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"12 1","pages":"490-501"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81467568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信