2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)最新文献_第6页

50K-C: A Dataset of Compilable, and Compiled, Java Projects 50K-C:可编译和已编译Java项目的数据集

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196450

Pedro Martins, Rohan Achar, C. Lopes

引用次数: 29

Towards Extracting Web API Specifications from Documentation 从文档中提取Web API规范

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196411

Jinqiu Yang, Erik Wittern, Annie T. T. Ying, Julian T Dolby, Lin Tan

{"title":"Towards Extracting Web API Specifications from Documentation","authors":"Jinqiu Yang, Erik Wittern, Annie T. T. Ying, Julian T Dolby, Lin Tan","doi":"10.1145/3196398.3196411","DOIUrl":"https://doi.org/10.1145/3196398.3196411","url":null,"abstract":"Web API specifications are machine-readable descriptions of APIs. These specifications, in combination with related tooling, simplify and support the consumption of APIs. However, despite the increased distribution of web APIs, specifications are rare and their creation and maintenance heavily rely on manual efforts by third parties. In this paper, we propose an automatic approach and an associated tool called D2Spec for extracting significant parts of such specifications from web API documentation pages. Given a seed online documentation page of an API, D2Spec first crawls all documentation pages on the API, and then uses a set of machine-learning techniques to extract the base URL, path templates, and HTTP methods – collectively describing the endpoints of the API. We evaluate whether D2Spec can accurately extract endpoints from documentation on 116 web APIs. The results show that D2Spec achieves a precision of 87.1% in identifying base URLs, a precision of 80.3% and a recall of 80.9% in generating path templates, and a precision of 83.8% and a recall of 77.2% in extracting HTTP methods. In addition, in an evaluation on 64 APIs with pre-existing API specifications, D2Spec revealed many inconsistencies between web API documentation and their corresponding publicly available specifications. API consumers would benefit from D2Spec pointing them to, and allowing them thus to fix, such inconsistencies.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"20 7-8 1","pages":"454-464"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78173332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Enriched Event Streams: A General Dataset for Empirical Studies on In-IDE Activities of Software Developers 丰富的事件流:一个用于软件开发人员在ide内活动实证研究的通用数据集

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196400

Sebastian Proksch

引用次数: 26

A Search System for Mathematical Expressions on Software Binaries 软件二进制文件中数学表达式的搜索系统

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196413

Ridhi Jain, Sai Prathik, Venkatesh Vinayakarao, Rahul Purandare

{"title":"A Search System for Mathematical Expressions on Software Binaries","authors":"Ridhi Jain, Sai Prathik, Venkatesh Vinayakarao, Rahul Purandare","doi":"10.1145/3196398.3196413","DOIUrl":"https://doi.org/10.1145/3196398.3196413","url":null,"abstract":"Developers often ask for libraries that implement specific mathematical expressions. A fundamental bottleneck in building information retrieval (IR) systems to answer such mathematical queries is the inability to detect a given expression in software binaries. While we have a few math IR solutions such as EgoMath2 and Tangent-3 that work over text documents, none exist to search over software binaries. Our vision is to build a search system for binaries to answer queries containing mathematical expressions. A wide variety of compilers and differences in the way they optimize the code, pose difficult challenges to solve this problem. In this work, we discuss our preliminary results in detecting mathematical expressions in software binaries. We use a knowledge base assisted approach to solve this problem. We are able to search mathematical expressions with a precision of 80% and a recall of 53%. This work opens up interesting research opportunities in areas such as software security and performance, to help analysts in identifying and analyzing binaries for implementations of mathematical expressions.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"64 1","pages":"487-491"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75204872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Deep Learning Approach to Identifying Source Code in Images and Video 识别图像和视频中源代码的深度学习方法

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196402

J. Ott, Abigail Atchison, Paul Harnack, Adrienne Bergh, Erik J. Linstead

{"title":"A Deep Learning Approach to Identifying Source Code in Images and Video","authors":"J. Ott, Abigail Atchison, Paul Harnack, Adrienne Bergh, Erik J. Linstead","doi":"10.1145/3196398.3196402","DOIUrl":"https://doi.org/10.1145/3196398.3196402","url":null,"abstract":"While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"266 1","pages":"376-386"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77159680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

What did Really Change with the New Release of the App? 新版本的应用到底改变了什么?

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196449

Paolo Calciati, Konstantin Kuznetsov, Xue-Wei Bai, Alessandra Gorla

{"title":"What did Really Change with the New Release of the App?","authors":"Paolo Calciati, Konstantin Kuznetsov, Xue-Wei Bai, Alessandra Gorla","doi":"10.1145/3196398.3196449","DOIUrl":"https://doi.org/10.1145/3196398.3196449","url":null,"abstract":"The mobile app market is evolving at a very fast pace. In order to stay in the market and fulfill user's growing demands, developers have to continuously update their apps either to fix issues or to add new features. Users and market managers may have a hard time understanding what really changed in a new release though, and therefore may not make an informative guess of whether updating the app is recommendable, or whether it may pose new security and privacy threats for the user. We propose a ready-to-use framework to analyze the evolution of Android apps. Our framework extracts and visualizes various information -such as how an app uses sensitive data, which third-party libraries it relies on, which URLs it connects to, etc.- and combines it to create a comprehensive report on how the app evolved. Besides, we present the results of an empirical study on 235 applications with at least 50 releases using our framework. Our analysis reveals that Android apps tend to have more leaks of sensitive data over time, and that the majority of API calls relative to dangerous permissions are added to the code in releases posterior to the one where the corresponding permission was requested.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"10 1","pages":"142-152"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90114642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Do Software Engineers Use Autocompletion Features Differently than Other Developers? 软件工程师使用自动补全功能的方式与其他开发人员不同吗?

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196471

Rahul Amlekar, Andrés Felipe Rincón Gamboa, Keheliya Gallaba, Shane McIntosh

{"title":"Do Software Engineers Use Autocompletion Features Differently than Other Developers?","authors":"Rahul Amlekar, Andrés Felipe Rincón Gamboa, Keheliya Gallaba, Shane McIntosh","doi":"10.1145/3196398.3196471","DOIUrl":"https://doi.org/10.1145/3196398.3196471","url":null,"abstract":"Autocomplete is a common workspace feature that is used to recommend code snippets as developers type in their IDEs. Users of autocomplete features no longer need to remember programming syntax and the names and details of the API methods that are needed to accomplish tasks. Moreover, autocompletion of code snippets may have an accelerating effect, lowering the number of keystrokes that are needed to type the code. However, like any tool, implicit tendencies of users may emerge. Knowledge of how developers in different roles use autocompletion features may help to guide future autocompletion development, research, and training material. In this paper, we set out to better understand how usage of autocompletion varies among software engineers and other developers (i.e., academic researchers, industry researchers, hobby programmers, and students). Analysis of autocompletion events in the Mining Software Repositories (MSR) challenge dataset reveals that: (1) rates of autocompletion usage among software engineers and other developers are not significantly different; and (2) although several non-negligible effect sizes of autocompletion targets (e.g., local variables, method names) are detected between the two groups, the rates at which these targets appear do not vary to a significant degree. These inconclusive results are likely due to the small sample size (n = 35); however, they do provide an interesting insight for future studies to build upon.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"36 1","pages":"86-89"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91008489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

"Automatically Assessing Code Understandability" Reanalyzed: Combined Metrics Matter “自动评估代码可理解性”重新分析:组合度量很重要

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196441

Asher Trockman, Keenen Cates, Mark Mozina, T. Nguyen, Christian Kästner, Bogdan Vasilescu

{"title":"\"Automatically Assessing Code Understandability\" Reanalyzed: Combined Metrics Matter","authors":"Asher Trockman, Keenen Cates, Mark Mozina, T. Nguyen, Christian Kästner, Bogdan Vasilescu","doi":"10.1145/3196398.3196441","DOIUrl":"https://doi.org/10.1145/3196398.3196441","url":null,"abstract":"Previous research shows that developers spend most of their time understanding code. Despite the importance of code understandability for maintenance-related activities, an objective measure of it remains an elusive goal. Recently, Scalabrino et al. reported on an experiment with 46 Java developers designed to evaluate metrics for code understandability. The authors collected and analyzed data on more than a hundred features describing the code snippets, the developers' experience, and the developers' performance on a quiz designed to assess understanding. They concluded that none of the metrics considered can individually capture understandability. Expecting that understandability is better captured by a combination of multiple features, we present a reanalysis of the data from the Scalabrino et al. study, in which we use different statistical modeling techniques. Our models suggest that some computed features of code, such as those arising from syntactic structure and documentation, have a small but significant correlation with understandability. Further, we construct a binary classifier of understandability based on various interpretable code features, which has a small amount of discriminating power. Our encouraging results, based on a small data set, suggest that a useful metric of understandability could feasibly be created, but more data is needed.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"314-318"},"PeriodicalIF":0.0,"publicationDate":"2018-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83520538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Imprecisions Diagnostic in Source Code Deltas 源代码增量中的不精确诊断

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-28 DOI: 10.1145/3196398.3196404

Guillermo De la Torre, R. Robbes, Alexandre Bergel

引用次数: 10

Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow 学习从堆栈溢出中挖掘对齐代码和自然语言对

2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) Pub Date : 2018-05-23 DOI: 10.1145/3196398.3196408

Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, Graham Neubig

{"title":"Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow","authors":"Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, Graham Neubig","doi":"10.1145/3196398.3196408","DOIUrl":"https://doi.org/10.1145/3196398.3196408","url":null,"abstract":"For tasks like code synthesis from natural language, code retrieval, and code summarization, data-driven models have shown great promise. However, creating these models require parallel data between natural language (NL) and code with fine-grained alignments. StackOverflow (SO) is a promising source to create such a data set: the questions are diverse and most of them have corresponding answers with high-quality code snippets. However, existing heuristic methods (e.g. pairing the title of a post with the code in the accepted answer) are limited both in their coverage and the correctness of the NL-code pairs obtained. In this paper, we propose a novel method to mine high-quality aligned data from SO using two sets of features: hand-crafted features considering the structure of the extracted snippets, and correspondence features obtained by training a probabilistic model to capture the correlation between NL and code using neural networks. These features are fed into a classifier that determines the quality of mined NL-code pairs. Experiments using Python and Java as test beds show that the proposed method greatly expands coverage and accuracy over existing mining methods, even when using only a small number of labeled examples. Further, we find that reasonable results are achieved even when training the classifier on one language and testing on another, showing promise for scaling NL-code mining to a wide variety of programming languages beyond those for which we are able to annotate data.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"19 1","pages":"476-486"},"PeriodicalIF":0.0,"publicationDate":"2018-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74050259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 188