{"title":"Turnover of Companies in OpenStack: Prevalence and Rationale","authors":"Yuxia Zhang, Hui Liu, Xin Tan, Minghui Zhou, Zhi Jin, Jiaxin Zhu","doi":"10.1145/3510849","DOIUrl":"https://doi.org/10.1145/3510849","url":null,"abstract":"To achieve commercial goals, companies have made substantial contributions to large open-source software (OSS) ecosystems such as OpenStack and have become the main contributors. However, they often withdraw their employees for a variety of reasons, which may affect the sustainability of OSS projects. While the turnover of individual contributors has been extensively investigated, there is a lack of knowledge about the nature of companies’ withdrawal. To this end, we conduct a mixed-methods empirical study on OpenStack to reveal how common company withdrawals were, to what degree withdrawn companies made contributions, and what the rationale behind withdrawals was. By analyzing the commit data of 18 versions of OpenStack, we find that the number of companies that have left is increasing and even surpasses the number of companies that have joined in later versions. Approximately 12% of the companies in each version have exited by the next version. Compared to the sustaining companies that joined in the same version, the withdrawn companies tend to have a weaker contribution intensity but contribute to a similar scope of repositories in OpenStack. Through conducting a developer survey, we find four aspects of reasons for companies’ withdrawal from OpenStack: company, community, developer, and project. The most common reasons lie in the company aspect, i.e., the company either achieved its goals or failed to do so. By fitting the survival analysis model, we find that commercial goals are associated with the probability of the company’s withdrawal, and that a company’s contribution intensity and scale are positively correlated with its retention. Maintaining good retention is important but challenging for OSS ecosystems, and our results may shed light on potential approaches to improve company retention and reduce the negative impact of company withdrawal.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"14 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81720724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Albert, Pablo Gordillo, Alejandro Hernández-Cerezo, A. Rubio, M. A. Schett
{"title":"Super-optimization of Smart Contracts","authors":"E. Albert, Pablo Gordillo, Alejandro Hernández-Cerezo, A. Rubio, M. A. Schett","doi":"10.1145/3506800","DOIUrl":"https://doi.org/10.1145/3506800","url":null,"abstract":"Smart contracts are programs deployed on a blockchain. They are executed for a monetary fee paid in gas—a clear optimization target for smart contract compilers. Because smart contracts are a young, fast-moving field without (manually) fine-tuned compilers, they highly benefit from automated and adaptable approaches, especially as smart contracts are effectively immutable, and as such need a high level of assurance. This makes them an ideal domain for applying formal methods. Super-optimization is a technique to find the best translation of a block of instructions by trying all possible sequences of instructions that produce the same result. We present a framework for super-optimizing smart contracts based on Max-SMT with two main ingredients: (1) a stack functional specification extracted from the basic blocks of a smart contract, which is simplified using rules capturing the semantics of arithmetic, bit-wise, and relational operations, and (2) the synthesis of optimized blocks, which finds—by means of an efficient SMT encoding—basic blocks with minimal gas cost whose stack functional specification is equal (modulo commutativity) to the extracted one. We implemented our framework in the tool syrup 2.0. Through large-scale experiments on real-world smart contracts, we analyze performance improvements for different SMT encodings, as well as tradeoffs between quality of optimizations and required optimization time.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"6 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2022-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84069976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Verification of Programs Sensitive to Heap Layout","authors":"Henrich Lauko, Lukás Korencik, Petr Ročkai","doi":"10.1145/3508363","DOIUrl":"https://doi.org/10.1145/3508363","url":null,"abstract":"Most C and C++ programs use dynamically allocated memory (often known as a heap) to store and organize their data. In practice, it can be useful to compare addresses of different heap objects, for instance, to store them in a binary search tree or a sorted array. However, comparisons of pointers to distinct objects are inherently ambiguous: The address order of two objects can be reversed in different executions of the same program, due to the nature of the allocation algorithm and other external factors. This poses a significant challenge to program verification, since a sound verifier must consider all possible behaviors of a program, including an arbitrary reordering of the heap. A naive verification of all possibilities, of course, leads to a combinatorial explosion of the state space: For this reason, we propose an under-approximating abstract domain that can be soundly refined to consider all relevant heap orderings. We have implemented the proposed abstract domain and evaluated it against several existing software verification tools on a collection of pointer-manipulating programs. In many cases, existing tools only consider a single fixed heap order, which is a source of unsoundness. We demonstrate that using our abstract domain, this unsoundness can be repaired at only a very modest performance cost. Additionally, we show that, even though many verifiers ignore it, ambiguous behavior is present in a considerable fraction of programs from software verification competition (sv-comp).","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"4 1","pages":"1 - 27"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84138716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Yu, Xing Hu, Ge Li, Ying Li, Qianxiang Wang, Tao Xie
{"title":"Assessing and Improving an Evaluation Dataset for Detecting Semantic Code Clones via Deep Learning","authors":"Hao Yu, Xing Hu, Ge Li, Ying Li, Qianxiang Wang, Tao Xie","doi":"10.1145/3502852","DOIUrl":"https://doi.org/10.1145/3502852","url":null,"abstract":"In recent years, applying deep learning to detect semantic code clones has received substantial attention from the research community. Accordingly, various evaluation benchmark datasets, with the most popular one as BigCloneBench, are constructed and selected as benchmarks to assess and compare different deep learning models for detecting semantic clones. However, there is no study to investigate whether an evaluation benchmark dataset such as BigCloneBench is properly used to evaluate models for detecting semantic code clones. In this article, we present an experimental study to show that BigCloneBench typically includes semantic clone pairs that use the same identifier names, which however are not used in non-semantic-clone pairs. Subsequently, we propose an undesirable-by-design Linear-Model that considers only which identifiers appear in a code fragment; this model can achieve high effectiveness for detecting semantic clones when evaluated on BigCloneBench, even comparable to state-of-the-art deep learning models recently proposed for detecting semantic clones. To alleviate these issues, we abstract a subset of the identifier names (including type, variable, and method names) in BigCloneBench to result in AbsBigCloneBench and use AbsBigCloneBench to better assess the effectiveness of deep learning models on the task of detecting semantic clones.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"14 1","pages":"1 - 25"},"PeriodicalIF":0.0,"publicationDate":"2022-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91063434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Guaranteeing Timed Opacity using Parametric Timed Model Checking","authors":"É. André, D. Lime, Dylan Marinho, Junbo Sun","doi":"10.1145/3502851","DOIUrl":"https://doi.org/10.1145/3502851","url":null,"abstract":"Information leakage can have dramatic consequences on systems security. Among harmful information leaks, the timing information leakage occurs whenever an attacker successfully deduces confidential internal information. In this work, we consider that the attacker has access (only) to the system execution time. We address the following timed opacity problem: given a timed system, a private location and a final location, synthesize the execution times from the initial location to the final location for which one cannot deduce whether the system went through the private location. We also consider the full timed opacity problem, asking whether the system is opaque for all execution times. We show that these problems are decidable for timed automata (TAs) but become undecidable when one adds parameters, yielding parametric timed automata (PTAs). We identify a subclass with some decidability results. We then devise an algorithm for synthesizing PTAs parameter valuations guaranteeing that the resulting TA is opaque. We finally show that our method can also apply to program analysis.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"11 1","pages":"1 - 36"},"PeriodicalIF":0.0,"publicationDate":"2022-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81828412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, D. Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, T. Lemberger
{"title":"Verification Witnesses","authors":"D. Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, D. Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, T. Lemberger","doi":"10.1145/3477579","DOIUrl":"https://doi.org/10.1145/3477579","url":null,"abstract":"Over the last years, witness-based validation of verification results has become an established practice in software verification: An independent validator re-establishes verification results of a software verifier using verification witnesses, which are stored in a standardized exchange format. In addition to validation, such exchangable information about proofs and alarms found by a verifier can be shared across verification tools, and users can apply independent third-party tools to visualize and explore witnesses to help them comprehend the causes of bugs or the reasons why a given program is correct. To achieve the goal of making verification results more accessible to engineers, it is necessary to consider witnesses as first-class exchangeable objects, stored independently from the source code and checked independently from the verifier that produced them, respecting the important principle of separation of concerns. We present the conceptual principles of verification witnesses, give a description of how to use them, provide a technical specification of the exchange format for witnesses, and perform an extensive experimental study on the application of witness-based result validation, using the validators CPAchecker, UAutomizer, CPA-witness2test, and FShell-witness2test.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"252 1","pages":"1 - 69"},"PeriodicalIF":0.0,"publicationDate":"2022-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77683661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xing Hu, Qiuyuan Chen, Haoye Wang, Xin Xia, D. Lo, Thomas Zimmermann
{"title":"Correlating Automated and Human Evaluation of Code Documentation Generation Quality","authors":"Xing Hu, Qiuyuan Chen, Haoye Wang, Xin Xia, D. Lo, Thomas Zimmermann","doi":"10.1145/3502853","DOIUrl":"https://doi.org/10.1145/3502853","url":null,"abstract":"Automatic code documentation generation has been a crucial task in the field of software engineering. It not only relieves developers from writing code documentation but also helps them to understand programs better. Specifically, deep-learning-based techniques that leverage large-scale source code corpora have been widely used in code documentation generation. These works tend to use automatic metrics (such as BLEU, METEOR, ROUGE, CIDEr, and SPICE) to evaluate different models. These metrics compare generated documentation to reference texts by measuring the overlapping words. Unfortunately, there is no evidence demonstrating the correlation between these metrics and human judgment. We conduct experiments on two popular code documentation generation tasks, code comment generation and commit message generation, to investigate the presence or absence of correlations between these metrics and human judgments. For each task, we replicate three state-of-the-art approaches and the generated documentation is evaluated automatically in terms of BLEU, METEOR, ROUGE-L, CIDEr, and SPICE. We also ask 24 participants to rate the generated documentation considering three aspects (i.e., language, content, and effectiveness). Each participant is given Java methods or commit diffs along with the target documentation to be rated. The results show that the ranking of generated documentation from automatic metrics is different from that evaluated by human annotators. Thus, these automatic metrics are not reliable enough to replace human evaluation for code documentation generation tasks. In addition, METEOR shows the strongest correlation (with moderate Pearson correlation r about 0.7) to human evaluation metrics. However, it is still much lower than the correlation observed between different annotators (with a high Pearson correlation r about 0.8) and correlations that are reported in the literature for other tasks (e.g., Neural Machine Translation [39]). Our study points to the need to develop specialized automated evaluation metrics that can correlate more closely to human evaluation metrics for code generation tasks.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"119 1","pages":"1 - 28"},"PeriodicalIF":0.0,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91195337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-Aware Code Change Embedding for Better Patch Correctness Assessment","authors":"Bo Lin, Shangwen Wang, Ming Wen, Xiaoguang Mao","doi":"10.1145/3505247","DOIUrl":"https://doi.org/10.1145/3505247","url":null,"abstract":"Despite the capability in successfully fixing more and more real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated patch correctness assessment (APCA). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the AST path technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as Cache and performed extensive experiments to assess its effectiveness. Our results demonstrate that Cache can (1) perform better than previous representation learning based techniques (e.g., Cache relatively outperforms existing techniques by ( approx ) 6%, ( approx ) 3%, and ( approx ) 16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"1 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2022-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90438363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huangping Qiang, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, L. Ma, Mike Papadakis
{"title":"An Empirical Study on Data Distribution-Aware Test Selection for Deep Learning Enhancement","authors":"Huangping Qiang, Yuejun Guo, Maxime Cordy, Xiaofei Xie, Qiang Hu, Yuejun Guo, Maxime Cordy, Xiaofei Xie, L. Ma, Mike Papadakis","doi":"10.1145/3511598","DOIUrl":"https://doi.org/10.1145/3511598","url":null,"abstract":"Similar to traditional software that is constantly under evolution, deep neural networks need to evolve upon the rapid growth of test data for continuous enhancement (e.g., adapting to distribution shift in a new environment for deployment). However, it is labor intensive to manually label all of the collected test data. Test selection solves this problem by strategically choosing a small set to label. Via retraining with the selected set, deep neural networks will achieve competitive accuracy. Unfortunately, existing selection metrics involve three main limitations: (1) using different retraining processes, (2) ignoring data distribution shifts, and (3) being insufficiently evaluated. To fill this gap, we first conduct a systemically empirical study to reveal the impact of the retraining process and data distribution on model enhancement. Then based on our findings, we propose DAT, a novel distribution-aware test selection metric. Experimental results reveal that retraining using both the training and selected data outperforms using only the selected data. None of the selection metrics perform the best under various data distributions. By contrast, DAT effectively alleviates the impact of distribution shifts and outperforms the compared metrics by up to five times and 30.09% accuracy improvement for model enhancement on simulated and in-the-wild distribution shift scenarios, respectively.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"37 23","pages":"1 - 30"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91406493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Just-In-Time Defect Prediction on JavaScript Projects: A Replication Study","authors":"Chao Ni, Xin Xia, D. Lo, Xiaohu Yang, A. Hassan","doi":"10.1145/3508479","DOIUrl":"https://doi.org/10.1145/3508479","url":null,"abstract":"Change-level defect prediction is widely referred to as just-in-time (JIT) defect prediction since it identifies a defect-inducing change at the check-in time, and researchers have proposed many approaches based on the language-independent change-level features. These approaches can be divided into two types: supervised approaches and unsupervised approaches, and their effectiveness has been verified on Java or C++ projects. However, whether the language-independent change-level features can effectively identify the defects of JavaScript projects is still unknown. Additionally, many researches have confirmed that supervised approaches outperform unsupervised approaches on Java or C++ projects when considering inspection effort. However, whether supervised JIT defect prediction approaches can still perform best on JavaScript projects is still unknown. Lastly, prior proposed change-level features are programming language–independent, whether programming language–specific change-level features can further improve the performance of JIT approaches on identifying defect-prone changes is also unknown. To address the aforementioned gap in knowledge, in this article, we collect and label the top-20 most starred JavaScript projects on GitHub. JavaScript is an extremely popular and widely used programming language in the industry. We propose five JavaScript-specific change-level features and conduct a large-scale empirical study (i.e., involving a total of 176,902 changes) and find that (1) supervised JIT defect prediction approaches (i.e., CBS+) still statistically significantly outperform unsupervised approaches on JavaScript projects when considering inspection effort; (2) JavaScript-specific change-level features can further improve the performance of approach built with language-independent features on identifying defect-prone changes; (3) the change-level features in the dimension of size (i.e., LT), diffusion (i.e., NF), and JavaScript-specific (i.e., SO and TC) are the most important features for indicating the defect-proneness of a change on JavaScript projects; and (4) project-related features (i.e., Stars, Branches, Def Ratio, Changes, Files, Defective, and Forks) have a high association with the probability of a change to be a defect-prone one on JavaScript projects.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"83 1","pages":"1 - 38"},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82294145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}