软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549115
Prahar Pandya, Saurabh Tiwari
{"title":"CORMS: a GitHub and Gerrit based hybrid code reviewer recommendation approach for modern code review","authors":"Prahar Pandya, Saurabh Tiwari","doi":"10.1145/3540250.3549115","DOIUrl":"https://doi.org/10.1145/3540250.3549115","url":null,"abstract":"Modern Code review (MCR) techniques are widely adopted in both open-source software platforms and organizations to ensure the quality of their software products. However, the selection of reviewers for code review is cumbersome with the increasing size of development teams. The recommendation of inappropriate reviewers for code review can take more time and effort to complete the task effectively. In this paper, we extended the baseline of reviewers' recommendation framework - RevFinder - to handle issues with newly created files, retired reviewers, the external validity of results, and the accuracies of the state-of-the-art RevFinder. Our proposed hybrid approach, CORMS, works on similarity analysis to compute similarities among file paths, projects/sub-projects, author information, and prediction models to recommend reviewers based on the subject of the change. We conducted a detailed analysis on the widely used 20 projects of both Gerrit and GitHub to compare our results with RevFinder. Our results reveal that on average, CORMS, can achieve top-1, top-3, top-5, and top-10 accuracies, and Mean Reciprocal Rank (MRR) of 45.1%, 67.5%, 74.6%, 79.9% and 0.58 for the 20 projects, consequently improves the RevFinder approach by 44.9%, 34.4%, 20.8%, 12.3% and 18.4%, respectively.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"274 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86362064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549146
Chenxi Zhang, Xin Peng, Tong Zhou, Chaofeng Sha, Zhenghui Yan, Yiru Chen, Hong Yang
{"title":"TraceCRL: contrastive representation learning for microservice trace analysis","authors":"Chenxi Zhang, Xin Peng, Tong Zhou, Chaofeng Sha, Zhenghui Yan, Yiru Chen, Hong Yang","doi":"10.1145/3540250.3549146","DOIUrl":"https://doi.org/10.1145/3540250.3549146","url":null,"abstract":"Due to the large amount and high complexity of trace data, microservice trace analysis tasks such as anomaly detection, fault diagnosis, and tail-based sampling widely adopt machine learning technology. These trace analysis approaches usually use a preprocessing step to map structured features of traces to vector representations in an ad-hoc way. Therefore, they may lose important information such as topological dependencies between service operations. In this paper, we propose TraceCRL, a trace representation learning approach based on contrastive learning and graph neural network, which can incorporate graph structured information in the downstream trace analysis tasks. Given a trace, TraceCRL constructs an operation invocation graph where nodes represent service operations and edges represent operation invocations together with predefined features for invocation status and related metrics. Based on the operation invocation graphs of traces TraceCRL uses a contrastive learning method to train a graph neural network-based model for trace representation. In particular, TraceCRL employs six trace data augmentation strategies to alleviate the problems of class collision and uniformity of representation in contrastive learning. Our experimental studies show that TraceCRL can significantly improve the performance of trace anomaly detection and offline trace sampling. It also confirms the effectiveness of the trace augmentation strategies and the efficiency of TraceCRL.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"61 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88475779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549156
Shengyi Pan, Jiayuan Zhou, F. R. Côgo, Xin Xia, Lingfeng Bao, Xing Hu, Shanping Li, Ahmed E. Hassan
{"title":"Automated unearthing of dangerous issue reports","authors":"Shengyi Pan, Jiayuan Zhou, F. R. Côgo, Xin Xia, Lingfeng Bao, Xing Hu, Shanping Li, Ahmed E. Hassan","doi":"10.1145/3540250.3549156","DOIUrl":"https://doi.org/10.1145/3540250.3549156","url":null,"abstract":"The coordinated vulnerability disclosure (CVD) process is commonly adopted for open source software (OSS) vulnerability management, which suggests to privately report the discovered vulnerabilities and keep relevant information secret until the official disclosure. However, in practice, due to various reasons (e.g., lacking security domain expertise or the sense of security management), many vulnerabilities are first reported via public issue reports (IRs) before its official disclosure. Such IRs are dangerous IRs, since attackers can take advantages of the leaked vulnerability information to launch zero-day attacks. It is crucial to identify such dangerous IRs at an early stage, such that OSS users can start the vulnerability remediation process earlier and OSS maintainers can timely manage the dangerous IRs. In this paper, we propose and evaluate a deep learning based approach, namely MemVul, to automatically identify dangerous IRs at the time they are reported. MemVul augments the neural networks with a memory component, which stores the external vulnerability knowledge from Common Weakness Enumeration (CWE). We rely on publicly accessible CVE-referred IRs (CIRs) to operationalize the concept of dangerous IR. We mine 3,937 CIRs distributed across 1,390 OSS projects hosted on GitHub. Evaluated under a practical scenario of high data imbalance, MemVul achieves the best trade-off between precision and recall among all baselines. In particular, the F1-score of MemVul (i.e., 0.49) improves the best performing baseline by 44%. For IRs that are predicted as CIRs but not reported to CVE, we conduct a user study to investigate their usefulness to OSS stakeholders. We observe that 82% (41 out of 50) of these IRs are security-related and 28 of them are suggested by security experts to be publicly disclosed, indicating MemVul is capable of identifying undisclosed dangerous IRs.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90711624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549159
Shikai Guo, He Jiang, Zhihao Xu, Xiaochen Li, Zhilei Ren, Zhide Zhou, Rong Chen
{"title":"Detecting Simulink compiler bugs via controllable zombie blocks mutation","authors":"Shikai Guo, He Jiang, Zhihao Xu, Xiaochen Li, Zhilei Ren, Zhide Zhou, Rong Chen","doi":"10.1145/3540250.3549159","DOIUrl":"https://doi.org/10.1145/3540250.3549159","url":null,"abstract":"As a popular Cyber-Physical System (CPS) development tool chain, MathWorks Simulink is widely used to prototype CPS models in safety-critical applications, e.g., aerospace and healthcare. It is crucial to ensure the correctness and reliability of Simulink compiler (i.e., the compiler module of Simulink) in practice since all CPS models depend on compilation. However, Simulink compiler testing is challenging due to millions of lines of source code and the lack of the complete formal language specification. Although several methods have been proposed to automatically test Simulink compiler, there still remains two challenges to be tackled, namely the limited variant space and the insufficient mutation diversity. To address these challenges, we propose COMBAT, a new differential testing method for Simulink compiler testing. COMBAT includes an EMI (Equivalence Modulo Input) mutation component and a diverse variant generation component. The EMI mutation component inserts assertion statements (e.g., If /While blocks) at arbitrary points of the seed CPS model. These statements break each insertion point into true and false branches. Then, COMBAT feeds all the data passed through the insertion point into the true branch to preserve the equivalence of CPS variants. In such a way, the body of the false branch could be viewed as a new variant space, thus addressing the first challenge. The diverse variant generation component uses Markov chain Monte Carlo optimization to sample the seed CPS model and generate complex mutations of long sequences of blocks in the variant space, thus addressing the second challenge. Experiments demonstrate that COMBAT significantly outperforms the state-of-the-art approaches in Simulink compiler testing. Within five months, COMBAT has reported 16 valid bugs for Simulink R2021b, of which 11 bugs have been confirmed as new bugs by MathWorks Support.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"98 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73407463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3558947
Ying Fu, Meng Yan, Jian Xu, Jianguo Li, Zhongxin Liu, Xiaohong Zhang, Dan Yang
{"title":"Investigating and improving log parsing in practice","authors":"Ying Fu, Meng Yan, Jian Xu, Jianguo Li, Zhongxin Liu, Xiaohong Zhang, Dan Yang","doi":"10.1145/3540250.3558947","DOIUrl":"https://doi.org/10.1145/3540250.3558947","url":null,"abstract":"Logs are widely used for system behavior diagnosis by automatic log mining. Log parsing is an important data preprocessing step that converts semi-structured log messages into structured data as the feature input for log mining. Currently, many studies are devoted to proposing new log parsers. However, to the best of our knowledge, no previous study comprehensively investigates the effectiveness of log parsers in industrial practice. To investigate the effectiveness of the log parsers in industrial practice, in this paper, we conduct an empirical study on the effectiveness of six state-of-the-art log parsers on 10 microservice applications of Ant Group. Our empirical results highlight two challenges for log parsing in practice: 1) various separators. There are various separators in a log message, and the separators in different event templates or different applications are also various. Current log parsers cannot perform well because they do not consider various separators. 2) Various lengths due to nested objects. The log messages belonging to the same event template may also have various lengths due to nested objects. The log messages of 6 out of 10 microservice applications at Ant Group with various lengths due to nested objects. 4 out of 6 state-of-the-art log parsers cannot deal with various lengths due to nested objects. In this paper, we propose an improved log parser named Drain+ based on a state-of-the-art log parser Drain. Drain+ includes two innovative components to address the above two challenges: a statistical-based separators generation component, which generates separators automatically for log message splitting, and a candidate event template merging component, which merges the candidate event templates by a template similarity method. We evaluate the effectiveness of Drain+ on 10 microservice applications of Ant Group and 16 public datasets. The results show that Drain+ outperforms the six state-of-the-art log parsers on industrial applications and public datasets. Finally, we conclude the observations in the road ahead for log parsing to inspire other researchers and practitioners.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77372831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549140
Zhengkai Wu, Vu Le, A. Tiwari, Sumit Gulwani, Arjun Radhakrishna, Ivan Radicek, Gustavo Soares, Xinyu Wang, Zhenwen Li, Tao Xie
{"title":"NL2Viz: natural language to visualization via constrained syntax-guided synthesis","authors":"Zhengkai Wu, Vu Le, A. Tiwari, Sumit Gulwani, Arjun Radhakrishna, Ivan Radicek, Gustavo Soares, Xinyu Wang, Zhenwen Li, Tao Xie","doi":"10.1145/3540250.3549140","DOIUrl":"https://doi.org/10.1145/3540250.3549140","url":null,"abstract":"Recent development in NL2CODE (Natural Language to Code) research allows end-users, especially novice programmers to create a concrete implementation of their ideas such as data visualization by providing natural language (NL) instructions. An NL2CODE system often fails to achieve its goal due to three major challenges: the user's words have contextual semantics, the user may not include all details needed for code generation, and the system results are imperfect and require further refinement. To address the aforementioned three challenges for NL to Visualization, we propose a new approach and its supporting tool named NL2VIZ with three salient features: (1) leveraging not only the user's NL input but also the data and program context that the NL query is upon, (2) using hard/soft constraints to reflect different confidence levels in the constraints retrieved from the user input and data/program context, and (3) providing support for result refinement and reuse. We implement NL2VIZ in the Jupyter Notebook environment and evaluate NL2VIZ on a real-world visualization benchmark and a public dataset to show the effectiveness of NL2VIZ. We also conduct a user study involving 6 data scientist professionals to demonstrate the usability of NL2VIZ, the readability of the generated code, and NL2VIZ's effectiveness in helping users generate desired visualizations effectively and efficiently.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79749289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549087
Yucen Shi, Ying Yin, Zhengkui Wang, David Lo, Tao Zhang, Xin Xia, Yuhai Zhao, Bowen Xu
{"title":"How to better utilize code graphs in semantic code search?","authors":"Yucen Shi, Ying Yin, Zhengkui Wang, David Lo, Tao Zhang, Xin Xia, Yuhai Zhao, Bowen Xu","doi":"10.1145/3540250.3549087","DOIUrl":"https://doi.org/10.1145/3540250.3549087","url":null,"abstract":"Semantic code search greatly facilitates software reuse, which enables users to find code snippets highly matching user-specified natural language queries. Due to the rich expressive power of code graphs (e.g., control-flow graph and program dependency graph), both of the two mainstream research works (i.e., multi-modal models and pre-trained models) have attempted to incorporate code graphs for code modelling. However, they still have some limitations: First, there is still much room for improvement in terms of search effectiveness. Second, they have not fully considered the unique features of code graphs. In this paper, we propose a Graph-to-Sequence Converter, namely G2SC. Through converting the code graphs into lossless sequences, G2SC enables to address the problem of small graph learning using sequence feature learning and capture both the edges and nodes attribute information of code graphs. Thus, the effectiveness of code search can be greatly improved. In particular, G2SC first converts the code graph into a unique corresponding node sequence by a specific graph traversal strategy. Then, it gets a statement sequence by replacing each node with its corresponding statement. A set of carefully designed graph traversal strategies guarantee that the process is one-to-one and reversible. G2SC enables capturing rich semantic relationships (i.e., control flow, data flow, node/relationship properties) and provides learning model-friendly data transformation. It can be flexibly integrated with existing models to better utilize the code graphs. As a proof-of-concept application, we present two G2SC enabled models: GSMM (G2SC enabled multi-modal model) and GSCodeBERT (G2SC enabled CodeBERT model). Extensive experiment results on two real large-scale datasets demonstrate that GSMM and GSCodeBERT can greatly improve the state-of-the-art models MMAN and GraphCodeBERT by 92% and 22% on R@1, and 63% and 11.5% on MRR, respectively.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81342607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3558939
O. Vašíček, Jan Fiedor, Tomas Kratochvila, B. Krena, A. Smrčka, Tomáš Vojnar
{"title":"Unite: an adapter for transforming analysis tools to web services via OSLC","authors":"O. Vašíček, Jan Fiedor, Tomas Kratochvila, B. Krena, A. Smrčka, Tomáš Vojnar","doi":"10.1145/3540250.3558939","DOIUrl":"https://doi.org/10.1145/3540250.3558939","url":null,"abstract":"This paper describes Unite, a new tool intended as an adapter for transforming non-interactive command-line analysis tools to OSLC-compliant web services. Unite aims to make such tools easier to adopt and more convenient to use by allowing them to be accessible, both locally and remotely, in a unified way and to be easily integrated into various development environments. Open Services for Lifecycle Collaboration (OSLC) is an open standard for tool integration and was chosen for this task due to its robustness, extensibility, support of data from various domains, and its growing popularity. The work is motivated by allowing existing analysis tools to be more widely used with a strong emphasis on widening their industrial usage. We have implemented Unite and used it with multiple existing static as well as dynamic analysis and verification tools, and then successfully deployed it internationally in the industry to automate verification tasks for development teams in Honeywell. We discuss Honeywell's experience with using Unite and with OSLC in general. Moreover, we also provide the Unite Client (UniC) for Eclipse to allow users to easily run various analysis tools directly from the Eclipse IDE.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81349700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549110
Yuandao Cai, Chengfeng Ye, Qingkai Shi, Charles Zhang
{"title":"Peahen: fast and precise static deadlock detection via context reduction","authors":"Yuandao Cai, Chengfeng Ye, Qingkai Shi, Charles Zhang","doi":"10.1145/3540250.3549110","DOIUrl":"https://doi.org/10.1145/3540250.3549110","url":null,"abstract":"Deadlocks still severely inflict reliability and security issues upon software systems of the modern age. Worse still, as we note, in prior static deadlock detectors, good precision does not go hand-in-hand with high scalability --- their approaches are either context-insensitive, thereby engendering many false positives, or suffer from the calling context explosion to reach context-sensitive, thus compromising good efficiency. In this paper, we advocate Peahen, geared towards precise yet also scalable static deadlock detection. At its crux, Peahen decomposes the computational effort for embracing high precision into two cooperative analysis stages: (i) context-insensitive lock-graph construction, which selectively encodes the essential lock-acquisition information on each edge, and (ii) three precise yet lazy refinements, which incorporate such edge information into progressively refining the deadlock cycles in the lock graph only for a few interesting calling contexts. Our extensive experiments yield promising results: Peahen dramatically out-performs the state-of-the-art tools on accuracy without losing scalability; it can efficiently check million-line systems at a low false positive rate; and it has uncovered many confirmed deadlocks in dozens of mature open-source systems.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"16 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89369826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549148
Zhiyuan Wan, Xin Xia, Yun Zhang, David Lo, Daibing Zhou, Qiuyuan Chen, A. Hassan
{"title":"What motivates software practitioners to contribute to inner source?","authors":"Zhiyuan Wan, Xin Xia, Yun Zhang, David Lo, Daibing Zhou, Qiuyuan Chen, A. Hassan","doi":"10.1145/3540250.3549148","DOIUrl":"https://doi.org/10.1145/3540250.3549148","url":null,"abstract":"Software development organizations have adopted open source development practices to support or augment their software development processes, a phenomenon referred to as inner source. Given the rapid adoption of inner source, we wonder what motivates software practitioners to contribute to inner source projects. We followed a mixed-methods approach--a qualitative phase of interviews with 20 interviewees, followed by a quantitative phase of an exploratory survey with 124 respondents from 13 countries across four continents. Our study uncovers practitioners' motivation to contribute to inner source projects, as well as how the motivation differs from what motivates practitioners to participate in open source projects. We also investigate how software practitioners' motivation impacts their contribution level and continuance intention in inner source projects. Based on our findings, we outline directions for future research and provide recommendations for organizations and software practitioners.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89523866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}