{"title":"Reanalysis of Empirical Data on Java Local Variables with Narrow and Broad Scope","authors":"D. Feitelson","doi":"10.1109/ICPC58990.2023.00037","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00037","url":null,"abstract":"It is generally accepted that variables with a narrow syntactic scope can have short names, whereas variables with a broad scope require more informative longer names. We study how names are given in practice, using a dataset of nearly 640 thousand variable names from Java methods, recently introduced by Aman et al. We extend their original analysis by using a finer division of scopes into ranges. We find that indeed variables with broader scope tend to be slightly longer and to include more words. There is also a progression of changes in name structures, with fewer single-letter names and more compound names as the scope increases. But the biggest differences occur at the low-scope end, not the high-scope end. In addition, we present more evidence that words of 6 letters or more are often abbreviated, but this is not affected by scope. Finally, we also analyze the distribution of popularity of names and of words in names, and show that single letter names are much more varied and common than usually thought, even when the variables have a broad scope.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134318932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gianlorenzo Occhipinti, Csaba Nagy, Roberto Minelli, Michele Lanza
{"title":"SYN: Ultra-Scale Software Evolution Comprehension","authors":"Gianlorenzo Occhipinti, Csaba Nagy, Roberto Minelli, Michele Lanza","doi":"10.1109/ICPC58990.2023.00020","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00020","url":null,"abstract":"The comprehension of very large-scale software system evolution remains a challenging problem due to the sheer amount of time-based (i.e., a sequence of changes) data and its intrinsically complex nature (i.e., heterogeneous changes across the entire system source code). It is a necessary step for program comprehension, as systems are not simply created out of thin air in a bang, but are the sum of many changes over long periods of time, by various actors and due to various circumstances.We present SYN, a web-based tool that uses versatile vi-sualization and data processing techniques to create scalable depictions of ultra-scale software system evolution. SYN has been successfully applied on several systems versioned on GitHub, including the nearly 20-year history of the Linux operating system, which totals more than one million commits on more than 100k evolving files.Webpage of the tool and demo video: https://syn.si.usi.ch","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134085197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conversation Disentanglement As-a-Service","authors":"E. Riggio, Marco Raglianti, Michele Lanza","doi":"10.1109/ICPC58990.2023.00018","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00018","url":null,"abstract":"Modern instant messaging applications (e.g., Gitter, Slack, Discord) provide users with real-time communication means. Developers use them for collaborative development, to ask for code reviews, and to have software-related discussions. In short, a (potential) treasure trove for program comprehension. However, as with any high-throughput \"chat application\", messages interleave, leading to concurrent conversations. Associating messages to conversations is called conversation disentanglement, a useful and necessary pre-processing step to analyze datasets of instant messages. Although various conversation disentanglement algorithms have been proposed, it is cumbersome to set up proper execution environments and hard to ensure input data format consistency, calling for better practices and tool support.We present CODI, a RESTful API micro-service and web interface for conversation disentanglement. It provides an easy way to disentangle conversation transcripts with pre-trained models or to train new ones on custom datasets, features, and hyper-parameters. CODI achieves state-of-the-art performances on transcripts of IRC, Slack, and Discord conversations. We show how CODI can provide a significant improvement to reusability (and replicability) of research results, while reducing the efforts and potential mistakes due to configuration, setup, and execution.CODI’s source code: https://github.com/USIREVEAL/CODI","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126757671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding initial API comprehension","authors":"Ava Heinonen, Fabian Fagerholm","doi":"10.1109/ICPC58990.2023.00016","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00016","url":null,"abstract":"Programmers encounter new Application Programming Interfaces (APIs) regularly as a part of their work. Difficulties in API comprehension affect programmers’ performance and the quality of the software they produce. To effectively support API comprehension, it is important to understand how programmers comprehend new APIs in real-life work contexts.In this study, we explore programmers’ initial API comprehension efforts. We analyze what information programmers need about an API before they are ready to start working with it and the actions and information sources they use to acquire this information. Furthermore, we identify different contextual factors that affect this process.We used the critical incident method to interview programmers about their API comprehension processes in work contexts. Our results show that before our participants were ready to start using an API for a task, they sought information about the API from various sources to assess its validity and evaluate it with respect to the requirements of the task. They used their background knowledge to steer their information-seeking efforts and to recognize key pieces of information that strengthened or weakened their confidence in the suitability of the API for the task at hand.As initial API comprehension and the resulting initial API mental models seem to guide further stages of programmers’ API comprehension efforts, they heavily influence the direction of the rest of the comprehension process. Therefore, it should be considered in the design of means to support API comprehension, such as API documentation.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115003133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigating the Generalizability of Deep Learning-based Clone Detectors","authors":"Eunjong Choi, Norihiro Fuke, Yuji Fujiwara, Norihiro Yoshida, Katsuro Inoue","doi":"10.1109/ICPC58990.2023.00032","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00032","url":null,"abstract":"The generalizability of Deep Learning (DL) models is a significant challenge, as poor generalizability indicates that the model has overfitted to the training data and is not able to generalize to new data. Despite numerous DL-based clone detectors emerging in recent years, their generalizability has not been thoroughly assessed. This study investigates the generalizability of three DL-based clone detectors (CCLearner, ASTNN, and CodeBERT) by comparing their detection accuracy on different training and testing clone benchmarks. The results show that all three clone detectors do not generalize well to new data and there is a strong relationship between clone types and generalizability for CCLearner and ASTNN.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122525363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FVA: Assessing Function-Level Vulnerability by Integrating Flow-Sensitive Structure and Code Statement Semantic","authors":"Chao Ni, Liyu Shen, Wen Wang, Xiang Chen, Xin Yin, Lexiao Zhang","doi":"10.1109/ICPC58990.2023.00048","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00048","url":null,"abstract":"Previous studies have been conducted on software vulnerability (SV) assessment at the code-based level, especially the function level. However, a key limitation of these studies is that they do not consider the structure information (e.g., control dependency and data dependency) of a vulnerable function, which is crucial for understanding SVs and assigning priority for fixing. In this study, we propose a flow-sensitive, multitask, and function-level vulnerability assessment method named FVA, which considers both global structure information and local semantic information. More specifically, FVA considers two types of flow information extracted from the control dependence graph and the data dependence graph. Meanwhile, FVA also considers the deep semantic information of the statement as well as its various types of contexts (i.e., surrounding context and program slicing context). We evaluate the effectiveness of FVA on the large-scale dataset (4,467 functions) by comparing it with four state-of-the-art baselines in terms of five performance measures. The experimental results indicate that FVA outperforms these baselines by a significant margin. More precisely, on average, FVA obtains 0.795 of F1-score and 0.727 of MCC, which improves baselines by 5%-14% and 8%-20%, respectively.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125109730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Bosco, Pasquale Cavoto, Augusto Ungolo, B. Muse, Foutse Khomh, Vittoria Nardone, M. D. Penta
{"title":"UnityLint: A Bad Smell Detector for Unity","authors":"Matteo Bosco, Pasquale Cavoto, Augusto Ungolo, B. Muse, Foutse Khomh, Vittoria Nardone, M. D. Penta","doi":"10.1109/ICPC58990.2023.00033","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00033","url":null,"abstract":"The video game industry is particularly rewarding as it represents a large portion of the software development market. However, working in this domain may be challenging for developers, not only because of the need for heterogeneous skills (from software design to computer graphics), but also for the limited body of knowledge in terms of good and bad design and development principles, and the lack of tool support to assist them. This tool demo proposes UnityLint, a tool able to detect 18 types of bad smells in Unity video games. UnityLint builds upon a previously-defined and validated catalog of bad smells for video games. The tool, developed in C# and available both as open-source and binary releases, is composed of (i) analyzers that extract facts from video game source code and metadata, and (ii) smell detectors that leverage detection rules to identify smells on top of the extracted facts.Tool: https://github.com/mdipenta/UnityCodeSmellAnalyzerTeaser Video: https://youtu.be/HooegxZ8H6g","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129333354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiyu Yang, Tetsuya Kanda, Davide Pizzolotto, D. Germán, Yoshiki Higo
{"title":"PyVerDetector: A Chrome Extension Detecting the Python Version of Stack Overflow Code Snippets","authors":"Shiyu Yang, Tetsuya Kanda, Davide Pizzolotto, D. Germán, Yoshiki Higo","doi":"10.1109/ICPC58990.2023.00013","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00013","url":null,"abstract":"Over the years, Stack Overflow (SO) has accumulated numerous code snippets, with developers going to SO for problem solutions and code references. However, in the case of the Python programming language, Python 3 is not necessarily backward compatible with Python 2. The major implication of this versioning problem is that code written in Python 2 may not be interpreted by Python 3 without modifications. This issue may affect the usability of Python code snippets on SO. We investigate how many Python code snippets on SO suffer from version compatibility issues, and find that about 10% of the snippets exhibit this problem. Moreover, of the code snippets that are interpretable only by Python 2 or Python 3, less than 17% are tagged with the Python version.In this paper, we present a Chrome extension called PyVerDetector. This extension allows the user to select a given version of Python and verifies whether the code snippets on a given SO question are compatible with the user’s selected Python version, providing error messages if not. The tool parses snippets and can determine versioning errors due to differences in syntax and also provides the user with a list of Python versions capable of interpreting each code snippet.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132848108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interpretation-based Code Summarization","authors":"Mingyang Geng, Shangwen Wang, Dezun Dong, Hao Wang, Shaomeng Cao, Kechi Zhang, Zhi Jin","doi":"10.1109/ICPC58990.2023.00026","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00026","url":null,"abstract":"Code comment, i.e., the natural language text to describe the semantic of a code snippet, is an important way for developers to comprehend the code. Recently, a number of approaches have been proposed to automatically generate the comment given a code snippet, aiming at facilitating the comprehension activities of developers. Despite that state-of-the-art approaches have already utilized advanced machine learning techniques such as the Transformer model, they often ignore critical information of the source code, leading to the inaccuracy of the generated summarization. In this paper, to boost the effectiveness of code summarization, we propose a two-stage paradigm, where in the first stage, we train an off-the-shelf model and then identify its focuses when generating the initial summarization, through a model interpretation approach, and in the second stage, we reinforce the model to generate more qualified summarization based on the source code and its focuses. Our intuition is that in such a manner the model could learn to identify what critical information in the code has been captured and what has been missed in its initial summarization, and thus revise its initial summarization accordingly, just like how a human student learns to write high-quality summarization for a natural language text. Extensive experiments on two large-scale datasets show that our approach can boost the effectiveness of five state-of-the-art code summarization approaches significantly. Specifically, for the well-known code summarizer, DeepCom, utilizing our two-stage paradigm can increase its BLEU-4 values by around 30% and 25% on the two datasets, respectively.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124848745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi
{"title":"An Extensive Study of the Structure Features in Transformer-based Code Semantic Summarization","authors":"Kang Yang, Xinjun Mao, Shangwen Wang, Yihao Qin, Tanghaoran Zhang, Yao Lu, Kamal Al-Sabahi","doi":"10.1109/ICPC58990.2023.00024","DOIUrl":"https://doi.org/10.1109/ICPC58990.2023.00024","url":null,"abstract":"Transformers are now widely utilized in code intelligence tasks. To better fit highly structured source code, various structure information is passed into Transformer, such as positional encoding and abstract syntax tree (AST) based structures. However, it is still not clear how these structural features affect code intelligence tasks, such as code summarization. Addressing this problem is of vital importance for designing Transformer-based code models. Existing works are keen to introduce various structural information into Transformers while lacking persuasive analysis to reveal their contributions and interaction effects. In this paper, we conduct an empirical study of frequently-used code structure features for code representation, including two types of position encoding features and AST-based structure features. We propose a couple of probing tasks to detect how these structure features perform in Transformer and conduct comprehensive ablation studies to investigate how these structural features affect code semantic summarization tasks. To further validate the effectiveness of code structure features in code summarization tasks, we assess Transformer models equipped with these code structure features on a structural dependent summarization dataset. Our experimental results reveal several findings that may inspire future study: (1) there is a conflict between the influence of the absolute positional embeddings and relative positional embeddings in Transformer; (2) AST-based code structure features and relative position encoding features show a strong correlation and much contribution overlap for code semantic summarization tasks indeed exists between them; (3) Transformer models still have space for further improvement in explicitly understanding code structure information.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128982510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}