2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)最新文献_第7页

A Study on the Use of IDE Features for Debugging IDE特性在调试中的应用研究

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-03 DOI: 10.1145/3196398.3196468

Afsoon Afzal, Claire Le Goues

引用次数: 8

Mining the Mind, Minding the Mine: Grand Challenges in Comprehension and Mining 挖掘心灵，看守矿山:理解与挖掘的重大挑战

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-01 DOI: 10.1145/3196398.3196477

Amy J. Ko

引用次数: 3

Revisiting "Programmers' Build Errors" in the Visual Studio Context 在Visual Studio环境中重新审视“程序员的构建错误”

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-01 DOI: 10.1145/3196398.3196469

Noam Rabbani, Michael S. Harvey, Sadnan Saquif, Keheliya Gallaba, Shane McIntosh

引用次数: 5

Bayesian Hierarchical Modelling for Tailoring Metric Thresholds 裁剪度量阈值的贝叶斯层次模型

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-04-06 DOI: 10.1145/3196398.3196443

Neil A. Ernst

引用次数: 11

Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval 评估开发人员如何使用通用的web搜索进行代码检索

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-22 DOI: 10.1145/3196398.3196425

Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray

{"title":"Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval","authors":"Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray","doi":"10.1145/3196398.3196425","DOIUrl":"https://doi.org/10.1145/3196398.3196425","url":null,"abstract":"Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that are more optimized for generic, non-code related information. In this paper, we explore whether a general purpose search engine like Google is an optimal choice for code-related searches. In particular, we investigate whether the performance of searching with Google varies for code vs. non-code related searches. To analyze this, we collect search logs from 310 developers that contains nearly 150,000 search queries from Google and the associated result clicks. To di?erentiate between code-related searches and non-code related searches, we build a model which identifies code intent of queries. Leveraging this model, we build an automatic classifier that detects a code and non-code related query. We confirm the e?ectiveness of the classifier on manually annotated queries where the classifier achieves a precision of 87%, a recall of 86%, and an F1-score of 87%. We apply this classifier to automatically annotate all the queries in the dataset. Analyzing this dataset, we observe that code related searching often requires more e?ort (e.g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"16 1","pages":"465-475"},"PeriodicalIF":0.0,"publicationDate":"2018-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77150431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 49

Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline 是否使用自然语言(NLoN)——一个用于软件工程文本分析管道的软件包

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-20 DOI: 10.1145/3196398.3196444

M. Mäntylä, Fabio Calefato, Maëlick Claes

引用次数: 16

Public Git Archive: A Big Code Dataset for All 公共Git归档:一个大的代码数据集

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-20 DOI: 10.1145/3196398.3196464

Vadim Markovtsev, Waren Long

{"title":"Public Git Archive: A Big Code Dataset for All","authors":"Vadim Markovtsev, Waren Long","doi":"10.1145/3196398.3196464","DOIUrl":"https://doi.org/10.1145/3196398.3196464","url":null,"abstract":"The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research potential enabled by the available open source code is clearly substantial, no significant large-scale open source code datasets exist. In this paper, we present the Public Git Archive – dataset of 182,014 top-bookmarked Git repositories from GitHub. We describe the novel data retrieval pipeline to reproduce it. We also elaborate on the strategy for performing dataset updates and legal issues. The Public Git Archive occupies 3.0 TB on disk and is an order of magnitude larger than the current source code datasets. The dataset is made available through HTTP and provides the source code of the projects, the related metadata, and development history. The data retrieval pipeline employs an optimized worker queue model and an optimized archive format to efficiently store forked Git repositories, reducing the amount of data to download and persist. Public Git Archive aims to open a myriad of new opportunities for \"Big Code\" research.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"37 1","pages":"34-37"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89571683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts SOTorrent:堆栈溢出帖子的重构与演变分析

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-20 DOI: 10.1145/3196398.3196430

Sebastian Baltes, Lorik Dumani, Christoph Treude, S. Diehl

{"title":"SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts","authors":"Sebastian Baltes, Lorik Dumani, Christoph Treude, S. Diehl","doi":"10.1145/3196398.3196430","DOIUrl":"https://doi.org/10.1145/3196398.3196430","url":null,"abstract":"Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"18 1","pages":"319-330"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81874278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 104

A Benchmark Study on Sentiment Analysis for Software Engineering Research 面向软件工程研究的情感分析基准研究

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-17 DOI: 10.1145/3196398.3196403

Nicole Novielli, Daniela Girardi, F. Lanubile

引用次数: 96

A Gold Standard for Emotion Annotation in Stack Overflow 堆栈溢出中情感标注的黄金标准

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-03-06 DOI: 10.1145/3196398.3196453

Nicole Novielli, Fabio Calefato, F. Lanubile

引用次数: 53