2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)最新文献_第10页

SourcererCC: Scaling Code Clone Detection to Big-Code SourcererCC:扩展代码克隆检测到大代码

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-12-20 DOI: 10.1145/2884781.2884877

Hitesh Sajnani, V. Saini, Jeffrey Svajlenko, C. Roy, C. Lopes

{"title":"SourcererCC: Scaling Code Clone Detection to Big-Code","authors":"Hitesh Sajnani, V. Saini, Jeffrey Svajlenko, C. Roy, C. Lopes","doi":"10.1145/2884781.2884877","DOIUrl":"https://doi.org/10.1145/2884781.2884877","url":null,"abstract":"Despite a decade of active research, there has been a marked lack in clone detection techniques that scale to large repositories for detecting near-miss clones. In this paper, we present a token-based clone detector, SourcererCC, that can detect both exact and near-miss clones from large inter-project repositories using a standard workstation. It exploits an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks: (1) a big benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (25K projects, 250MLOC) using a standard workstation.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"36 1","pages":"1157-1168"},"PeriodicalIF":0.0,"publicationDate":"2015-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88264034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 456

SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis 游泳:合成我的意思-代码搜索和习惯片段合成

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-11-26 DOI: 10.1145/2884781.2884808

Mukund Raghothaman, Yi Wei, Y. Hamadi

{"title":"SWIM: Synthesizing What I Mean - Code Search and Idiomatic Snippet Synthesis","authors":"Mukund Raghothaman, Yi Wei, Y. Hamadi","doi":"10.1145/2884781.2884808","DOIUrl":"https://doi.org/10.1145/2884781.2884808","url":null,"abstract":"Modern programming frameworks come with large libraries, with diverse applications such as for matching regular expressions, parsing XML files and sending email. Programmers often use search engines such as Google and Bing to learn about existing APIs. In this paper, we describe SWIM, a tool which suggests code snippets given API-related natural language queries such as \"generate md5 hash code\". We translate user queries into the APIs of interest using clickthrough data from the Bing search engine. Then, based on patterns learned from open-source code repositories, we synthesize idiomatic code describing the use of these APIs. We introduce emph{structured call sequences} to capture API-usage patterns. Structured call sequences are a generalized form of method call sequences, with if-branches and while-loops to represent conditional and repeated API usage patterns, and are simple to extract and amenable to synthesis. We evaluated SWIM with 30 common C# API-related queries received by Bing. For 70% of the queries, the first suggested snippet was a relevant solution, and a relevant solution was present in the top 10 results for all benchmarked queries. The online portion of the workflow is also very responsive, at an average of 1.5 seconds per snippet.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"357-367"},"PeriodicalIF":0.0,"publicationDate":"2015-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88091775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 152

PAC Learning-Based Verification and Model Synthesis 基于PAC学习的验证与模型综合

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-11-03 DOI: 10.1145/2884781.2884860

Yu-Fang Chen, Chiao-En Hsieh, Ondřej Lengál, Tsung-Ju Lii, M. Tsai, Bow-Yaw Wang, Farn Wang

引用次数: 27

Program Synthesis Using Natural Language 使用自然语言的程序合成

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-09-01 DOI: 10.1145/2884781.2884786

Aditya Desai, Sumit Gulwani, V. Hingorani, Nidhi Jain, Amey Karkare, Mark Marron, R. Sailesh, Subhajit Roy

引用次数: 117

Behavioral Log Analysis with Statistical Guarantees 具有统计保证的行为日志分析

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-08-30 DOI: 10.1145/2786805.2803198

Nimrod Busany, S. Maoz

引用次数: 20

Efficient Large-Scale Trace Checking Using MapReduce 使用MapReduce的高效大规模跟踪检查

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-08-26 DOI: 10.1145/2884781.2884832

M. Bersani, D. Bianculli, C. Ghezzi, S. Krstic, P. S. Pietro

{"title":"Efficient Large-Scale Trace Checking Using MapReduce","authors":"M. Bersani, D. Bianculli, C. Ghezzi, S. Krstic, P. S. Pietro","doi":"10.1145/2884781.2884832","DOIUrl":"https://doi.org/10.1145/2884781.2884832","url":null,"abstract":"The problem of checking a logged event trace against a temporal logic specification arises in many practical cases. Unfortunately, known algorithms for an expressive logic like MTL (Metric Temporal Logic) do not scale with respect to two crucial dimensions: the length of the trace and the size of the time interval of the formula to be checked. The former issue can be addressed by distributed and parallel trace checking algorithms that can take advantage of modern cloud computing and programming frameworks like MapReduce. Still, the latter issue remains open with current state-of-the-art approaches. In this paper we address this memory scalability issue by proposing a new semantics for MTL, called lazy semantics. This semantics can evaluate temporal formulae and boolean combinations of temporal-only formulae at any arbitrary time instant. We prove that lazy semantics is more expressive than point-based semantics and that it can be used as a basis for a correct parametric decomposition of any MTL formula into an equivalent one with smaller, bounded time intervals. We use lazy semantics to extend our previous distributed trace checking algorithm for MTL. The evaluation shows that the proposed algorithm can check formulae with large intervals, on large traces, in a memory-efficient way.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"134 1","pages":"888-898"},"PeriodicalIF":0.0,"publicationDate":"2015-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80986839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Learning API Usages from Bytecode: A Statistical Approach 从字节码学习API用法:一种统计方法

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-07-27 DOI: 10.1145/2884781.2884873

Tam The Nguyen, H. Pham, P. Vu, T. Nguyen

引用次数: 68

On the "Naturalness" of Buggy Code 论bug代码的“自然性”

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-06-03 DOI: 10.1145/2884781.2884848

Baishakhi Ray, V. Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar T. Devanbu

{"title":"On the \"Naturalness\" of Buggy Code","authors":"Baishakhi Ray, V. Hellendoorn, Saheel Godhane, Zhaopeng Tu, Alberto Bacchelli, Premkumar T. Devanbu","doi":"10.1145/2884781.2884848","DOIUrl":"https://doi.org/10.1145/2884781.2884848","url":null,"abstract":"Real software, the kind working programmers produce by the kLOC to solve real-world problems, tends to be “natural”, like speech or natural language; it tends to be highly repetitive and predictable. Researchers have captured this naturalness of software through statistical models and used them to good effect in suggestion engines, porting tools, coding standards checkers, and idiom miners. This suggests that code that appears improbable, or surprising, to a good statistical language model is “unnatural” in some sense, and thus possibly suspicious. In this paper, we investigate this hypothesis. We consider a large corpus of bug fix commits (ca. 7,139), from 10 different Java projects, and focus on its language statistics, evaluating the naturalness of buggy code and the corresponding fixes. We find that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed. Ordering files for inspection by their average entropy yields cost-effectiveness scores comparable to popular defect prediction methods. At a finer granularity, focusing on highly entropic lines is similar in cost-effectiveness to some well-known static bug finders (PMD, FindBugs) and or- dering warnings from these bug finders using an entropy measure improves the cost-effectiveness of inspecting code implicated in warnings. This suggests that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"62 1","pages":"428-439"},"PeriodicalIF":0.0,"publicationDate":"2015-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85783247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 204

Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective 基于拉动的开发中的工作实践和挑战:贡献者的视角

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2015-05-16 DOI: 10.1145/2884781.2884826

Georgios Gousios, M. Storey, Alberto Bacchelli

{"title":"Work Practices and Challenges in Pull-Based Development: The Contributor's Perspective","authors":"Georgios Gousios, M. Storey, Alberto Bacchelli","doi":"10.1145/2884781.2884826","DOIUrl":"https://doi.org/10.1145/2884781.2884826","url":null,"abstract":"The pull-based development model is an emerging way of contributing to distributed software projects that is gaining enormous popularity within the open source software (OSS) world. Previous work has examined this model by focusing on projects and their owners—we complement it by examining the work practices of project contributors and the challenges they face.We conducted a survey with 645 top contributors to active OSS projects using the pull-based model on GitHub, the prevalent social coding site. We also analyzed traces extracted from corresponding GitHub repositories. Our research shows that: contributors have a strong interest in maintaining awareness of project status to get inspiration and avoid duplicating work, but they do not actively propagate information; communication within pull requests is reportedly limited to low-level concerns and contributors often use communication channels external to pull requests; challenges are mostly social in nature, with most reporting poor responsiveness from integrators; and the increased transparency of this setting is a confirmed motivation to contribute. Based on these findings, we present recommendations for practitioners to streamline the contribution process and discuss potential future research directions.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"3 1","pages":"285-296"},"PeriodicalIF":0.0,"publicationDate":"2015-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81350533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 308