2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow 基于答案编辑挖掘的堆栈溢出演化模式研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00043

Themistoklis G. Diamantopoulos, Maria-Ioanna Sifaki, A. Symeonidis

引用次数: 6

Boa Meets Python: A Boa Dataset of Data Science Software in Python Language 蟒蛇遇见Python: Python语言数据科学软件的蟒蛇数据集

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00086

Sumon Biswas, Md Johirul Islam, Yijia Huang, Hridesh Rajan

引用次数: 21

Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice 应用深度树模型进行软件缺陷预测的经验教训

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00017

K. Dam, Trang Pham, S. W. Ng, T. Tran, J. Grundy, A. Ghose, Taeksu Kim, Chul-Joo Kim

引用次数: 61

Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts 探索词嵌入技术改进软件工程文本情感分析

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00020

Eeshita Biswas, K. Vijay-Shanker, L. Pollock

{"title":"Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts","authors":"Eeshita Biswas, K. Vijay-Shanker, L. Pollock","doi":"10.1109/MSR.2019.00020","DOIUrl":"https://doi.org/10.1109/MSR.2019.00020","url":null,"abstract":"Sentiment analysis (SA) of text-based software artifacts is increasingly used to extract information for various tasks including providing code suggestions, improving development team productivity, giving recommendations of software packages and libraries, and recommending comments on defects in source code, code quality, possibilities for improvement of applications. Studies of state-of-the-art sentiment analysis tools applied to software-related texts have shown varying results based on the techniques and training approaches. In this paper, we investigate the impact of two potential opportunities to improve the training for sentiment analysis of SE artifacts in the context of the use of neural networks customized using the Stack Overflow data developed by Lin et al. We customize the process of sentiment analysis to the software domain, using software domain-specific word embeddings learned from Stack Overflow (SO) posts, and study the impact of software domain-specific word embeddings on the performance of the sentiment analysis tool, as compared to generic word embeddings learned from Google News. We find that the word embeddings learned from the Google News data performs mostly similar and in some cases better than the word embeddings learned from SO posts. We also study the impact of two machine learning techniques, oversampling and undersampling of data, on the training of a sentiment classifier for handling small SE datasets with a skewed distribution. We find that oversampling alone, as well as the combination of oversampling and undersampling together, helps in improving the performance of a sentiment classifier.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"68-78"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90842152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

The Software Heritage Graph Dataset: Public Software Development Under One Roof 软件遗产图数据集:同一屋檐下的公共软件开发

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00030

Antoine Pietri, D. Spinellis, Stefano Zacchiroli

{"title":"The Software Heritage Graph Dataset: Public Software Development Under One Roof","authors":"Antoine Pietri, D. Spinellis, Stefano Zacchiroli","doi":"10.1109/MSR.2019.00030","DOIUrl":"https://doi.org/10.1109/MSR.2019.00030","url":null,"abstract":"Software Heritage is the largest existing public archive of software source code and accompanying development history: it currently spans more than five billion unique source code files and one billion unique commits, coming from more than 80 million software projects. This paper introduces the Software Heritage graph dataset: a fully-deduplicated Merkle DAG representation of the Software Heritage archive. The dataset links together file content identifiers, source code directories, Version Control System (VCS) commits tracking evolution over time, up to the full states of VCS repositories as observed by Software Heritage during periodic crawls. The dataset's contents come from major development forges (including GitHub and GitLab), FOSS distributions (e.g., Debian), and language-specific package managers (e.g., PyPI). Crawling information is also included, providing timestamps about when and where all archived source code artifacts have been observed in the wild. The Software Heritage graph dataset is available in multiple formats, including downloadable CSV dumps and Apache Parquet files for local use, as well as a public instance on Amazon Athena interactive query service for ready-to-use powerful analytical processing. Source code file contents are cross-referenced at the graph leaves, and can be retrieved through individual requests using the Software Heritage archive API.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"70 Suppl4 1","pages":"138-142"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75778636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

RapidRelease - A Dataset of Projects and Issues on Github with Rapid Releases RapidRelease - Github上的项目和问题数据集，具有快速发布功能

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00088

Saket Joshi, S. Chimalakonda

引用次数: 16

Impacts of Daylight Saving Time on Software Development 日光节约时间对软件开发的影响

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00076

J. Hayashi, Yoshiki Higo, S. Matsumoto, S. Kusumoto

引用次数: 3

Test Coverage in Python Programs Python程序中的测试覆盖率

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00027

Hongyu Zhai, Casey Casalnuovo, Premkumar T. Devanbu

引用次数: 13

Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials 描述堆栈溢出和教程之间的重复代码片段

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00048

Manziba Akanda Nishi, Agnieszka Ciborowska, Kostadin Damevski

引用次数: 7

Exploratory Study of Slack Q&A Chats as a Mining Source for Software Engineering Tools Slack问答聊天作为软件工程工具挖掘源的探索性研究

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) Pub Date : 2019-05-26 DOI: 10.1109/MSR.2019.00075

Preetha Chatterjee, Kostadin Damevski, L. Pollock, Vinay Augustine, Nicholas A. Kraft

引用次数: 50