软件产业与工程Pub Date : 2023-01-16DOI: 10.1109/ICSE-SEET58685.2023.00017
Marvin Wyrich, Stefan Wagner
{"title":"Teaching Computer Science Students to Communicate Scientific Findings More Effectively","authors":"Marvin Wyrich, Stefan Wagner","doi":"10.1109/ICSE-SEET58685.2023.00017","DOIUrl":"https://doi.org/10.1109/ICSE-SEET58685.2023.00017","url":null,"abstract":"Science communication forms the bridge between computer science researchers and their target audience. Researchers who can effectively draw attention to their research findings and communicate them comprehensibly not only help their target audience to actually learn something, but also benefit themselves from the increased visibility of their work and person. However, the necessary skills for good science communication must also be taught, and this has so far been neglected in the field of software engineering education. We therefore designed and implemented a science communication seminar for bachelor students of computer science curricula. Students take the position of a researcher who, shortly after publication, is faced with having to draw attention to the paper and effectively communicate the contents of the paper to one or more target audiences. Based on this scenario, each student develops a communication strategy for an already published software engineering research paper and tests the resulting ideas with the other seminar participants. We explain our design decisions for the seminar, and combine our experiences with responses to a participant survey into lessons learned. With this experience report, we intend to motivate and enable other lecturers to offer a similar seminar at their university. Collectively, university lecturers can prepare the next generation of computer science researchers to not only be experts in their field, but also to communicate research findings more effectively.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"93 1","pages":"107-114"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85509459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2023-01-12DOI: 10.1109/ICSE-SEET58685.2023.00008
Chun Yong Chong, Eunsuk Kang, Mary Shaw
{"title":"Open Design Case Study - A Crowdsourcing Effort to Curate Software Design Case Studies","authors":"Chun Yong Chong, Eunsuk Kang, Mary Shaw","doi":"10.1109/ICSE-SEET58685.2023.00008","DOIUrl":"https://doi.org/10.1109/ICSE-SEET58685.2023.00008","url":null,"abstract":"Case study-based learning has been successfully integrated into various courses, including software engineering education. In the context of software design courses, the use of case studies often entails sharing of real successful or failed software development. Using examples of real-world case studies allows educators to reinforce the applicability and usefulness of fundamental design concepts, relate the importance of evaluating design trade-offs with respect to stakeholders’ requirements, and highlight the importance of upfront design where students that lack industrial experience tend to overlook. However, the use of real-world case studies is not straightforward because 1.) there is a lack of open source repositories for real software design case studies and 2.) even if case studies are available, they are often reported without a standardized format, which may hinder the alignment between the case and the desired learning outcomes. To address the lack of software design case studies for educational purposes, we propose the idea of Open Design Case Study, a repository to crowdsource, curate, and recruit other educators to contribute case studies for teaching software design courses. The platform will also allow educators and students to share, brainstorm, and discuss design solutions based on case studies shared publicly on the repository.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"26 1","pages":"23-28"},"PeriodicalIF":0.0,"publicationDate":"2023-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86983888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3569443
M. Chechik
{"title":"On safety, assurance, and reliability: a software engineering perspective (keynote)","authors":"M. Chechik","doi":"10.1145/3540250.3569443","DOIUrl":"https://doi.org/10.1145/3540250.3569443","url":null,"abstract":"From financial services platforms to social networks to vehicle control, software has come to mediate many activities of daily life. Governing bodies and standards organizations have responded to this trend by creating regulations and standards to address issues such as safety, security and privacy. In this environment, the compliance of software development to standards and regulations has emerged as a key requirement. Compliance claims and arguments are often captured in assurance cases, with linked evidence of compliance. Evidence can come from test cases, verification proofs, human judgement, or a combination of these. That is, we try to build (safety-critical) systems carefully according to well justified methods and articulate these justifications in an assurance case that is ultimately judged by a human. Building safety arguments for traditional software systems is difficult — they are lengthy and expensive to maintain, especially as software undergoes change. Safety is also notoriously noncompositional — each subsystem might be safe but together they may create unsafe behaviors. It is also easy to miss cases, which in the simplest case would mean developing an argument for when a condition is true but missing arguing for a false condition. Furthermore, many ML-based systems are becoming safety-critical. For example, recent Tesla self-driving cars misclassified emergency vehicles and caused multiple crashes. ML-based systems typically do not have precisely specified and machine-verifiable requirements. While some safety requirements can be stated clearly: “the system should detect all pedestrians at a crossing”, these requirements are for the entire system, making them too high-level for safety analysis of individual components. Thus, systems with ML components (MLCs) add a significant layer of complexity for safety assurance. I argue that safety assurance should be an integral part of building safe and reliable software systems, but this process needs support from advanced software engineering and software analysis. In this talk, I outline a few approaches for development of principled, tool-supported methodologies for creating and managing assurance arguments. I then describe some of the recent work on specifying and verifying reliability requirements for machine-learned components in safety-critical domains.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75766103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3569449
Joanna C. S. Santos, Julian T Dolby
{"title":"Program analysis using WALA (tutorial)","authors":"Joanna C. S. Santos, Julian T Dolby","doi":"10.1145/3540250.3569449","DOIUrl":"https://doi.org/10.1145/3540250.3569449","url":null,"abstract":"Static analysis is widely used in research and practice for multiple purposes such as fault localization, vulnerability detection, code clone identification, code refactoring, optimization, etc. Since implementing static analyzers is a non-trivial task, engineers often rely on existing frameworks to implement their techniques. The IBM T.J. Watson Libraries for Analysis (WALA) is one of such frameworks that allows the analysis of multiple environments, such as Java bytecode (and related languages), JavaScript, Android, Python, etc. In this tutorial, we walk through the process of using WALA for program analysis. First, the tutorial will cover all the required background knowledge that is necessary to understand the technical implementation details of the explained algorithms and techniques. Subsequently, we provide a technical overview of the WALA framework and its support for analysis of multiple programming languages and frameworks code. Then, we will do several live demonstration of using WALA to implement client analyses. We will focus on two common uses of analysis: a form of security analysis, taint analysis, and on using analysis graphs for machine learning of code.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75311216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3560885
Taotao Gu, Xiang Li, Shuaibing Lu, Jianwen Tian, Yuanping Nie, Xiaohui Kuang, Zhechao Lin, Chenyifan Liu, Jie Liang, Yu Jiang
{"title":"Group-based corpus scheduling for parallel fuzzing","authors":"Taotao Gu, Xiang Li, Shuaibing Lu, Jianwen Tian, Yuanping Nie, Xiaohui Kuang, Zhechao Lin, Chenyifan Liu, Jie Liang, Yu Jiang","doi":"10.1145/3540250.3560885","DOIUrl":"https://doi.org/10.1145/3540250.3560885","url":null,"abstract":"Parallel fuzzing relies on hardware resources to guarantee test throughput and efficiency. In industrial practice, it is well known that parallel fuzzing faces the challenge of task division, but most works neglect the important process of corpus allocation. In this paper, we proposed a group-based corpus scheduling strategy to address these two issues, which has been accepted by the LLVM community. And we implement a parallel fuzzer based on this strategy called glibFuzzer. glibFuzzer first groups the global corpus into different subsets and then assigns different energy scores and different scores to them. The energy scores were mainly determined by the seed size and the length of coverage information, and the difference score can describe the degree of difference in the code covered by different subsets of seeds. In each round of key local corpus construction, the master node selects high-quality seeds by combining the two scores to improve test efficiency and avoid task conflict. To prove the effectiveness of the strategy, we conducted an extensive evaluation on the real-world programs and FuzzBench. After 4×24 CPU-hours, glibFuzzer covered 22.02% more branches and executed 19.42 times more test cases than libFuzzer in 18 real-world programs. glibFuzzer showed an average branch coverage increase of 73.02%, 55.02%, 55.86% over AFL, PAFL, UniFuzz, respectively. More importantly, glibFuzzer found over 100 unique vulnerabilities.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78614360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3560881
Diego Montes, Pongpatapee Peerapatanapokin, Jeff Schultz, Chengjun Guo, Wenxin Jiang, James C. Davis
{"title":"Discrepancies among pre-trained deep neural networks: a new threat to model zoo reliability","authors":"Diego Montes, Pongpatapee Peerapatanapokin, Jeff Schultz, Chengjun Guo, Wenxin Jiang, James C. Davis","doi":"10.1145/3540250.3560881","DOIUrl":"https://doi.org/10.1145/3540250.3560881","url":null,"abstract":"Training deep neural networks (DNNs) takes significant time and resources. A practice for expedited deployment is to use pre-trained deep neural networks (PTNNs), often from model zoos--collections of PTNNs; yet, the reliability of model zoos remains unexamined. In the absence of an industry standard for the implementation and performance of PTNNs, engineers cannot confidently incorporate them into production systems. As a first step, discovering potential discrepancies between PTNNs across model zoos would reveal a threat to model zoo reliability. Prior works indicated existing variances in deep learning systems in terms of accuracy. However, broader measures of reliability for PTNNs from model zoos are unexplored. This work measures notable discrepancies between accuracy, latency, and architecture of 36 PTNNs across four model zoos. Among the top 10 discrepancies, we find differences of 1.23%-2.62% in accuracy and 9%-131% in latency. We also find mismatches in architecture for well-known DNN architectures (e.g., ResNet and AlexNet). Our findings call for future works on empirical validation, automated tools for measurement, and best practices for implementation.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"108 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79385673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549153
Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, Lichao Sun
{"title":"You see what I want you to see: poisoning vulnerabilities in neural code search","authors":"Yao Wan, Shijie Zhang, Hongyu Zhang, Yulei Sui, Guandong Xu, Dezhong Yao, Hai Jin, Lichao Sun","doi":"10.1145/3540250.3549153","DOIUrl":"https://doi.org/10.1145/3540250.3549153","url":null,"abstract":"Searching and reusing code snippets from open-source software repositories based on natural-language queries can greatly improve programming productivity.Recently, deep-learning-based approaches have become increasingly popular for code search. Despite substantial progress in training accurate models of code search, the robustness of these models has received little attention so far. In this paper, we aim to study and understand the security and robustness of code search models by answering the following question: Can we inject backdoors into deep-learning-based code search models? If so, can we detect poisoned data and remove these backdoors? This work studies and develops a series of backdoor attacks on the deep-learning-based models for code search, through data poisoning. We first show that existing models are vulnerable to data-poisoning-based backdoor attacks. We then introduce a simple yet effective attack on neural code search models by poisoning their corresponding training dataset. Moreover, we demonstrate that attacks can also influence the ranking of the code search results by adding a few specially-crafted source code files to the training corpus. We show that this type of backdoor attack is effective for several representative deep-learning-based code search systems, and can successfully manipulate the ranking list of searching results. Taking the bidirectional RNN-based code search system as an example, the normalized ranking of the target candidate can be significantly raised from top 50% to top 4.43%, given a query containing an attacker targeted word, e.g., file. To defend a model against such attack, we empirically examine an existing popular defense strategy and evaluate its performance. Our results show the explored defense strategy is not yet effective in our proposed backdoor attack for code search systems.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"82 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83997287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549112
Poedjadevie Ramkisoen, John Businge, B. V. Bladel, Alexandre Decan, S. Demeyer, Coen De Roover, Foutse Khomh
{"title":"PaReco: patched clones and missed patches among the divergent variants of a software family","authors":"Poedjadevie Ramkisoen, John Businge, B. V. Bladel, Alexandre Decan, S. Demeyer, Coen De Roover, Foutse Khomh","doi":"10.1145/3540250.3549112","DOIUrl":"https://doi.org/10.1145/3540250.3549112","url":null,"abstract":"Re-using whole repositories as a starting point for new projects is often done by maintaining a variant fork parallel to the original. However, the common artifacts between both are not always kept up to date. As a result, patches are not optimally integrated across the two repositories, which may lead to sub-optimal maintenance between the variant and the original project. A bug existing in both repositories can be patched in one but not the other (we see this as a missed opportunity) or it can be manually patched in both probably by different developers (we see this as effort duplication). In this paper we present a tool (named PaReCo) which relies on clone detection to mine cases of missed opportunity and effort duplication from a pool of patches. We analyzed 364 (source to target) variant pairs with 8,323 patches resulting in a curated dataset containing 1,116 cases of effort duplication and 1,008 cases of missed opportunities. We achieve a precision of 91%, recall of 80%, accuracy of 88%, and F1-score of 85%. Furthermore, we investigated the time interval between patches and found out that, on average, missed patches in the target variants have been introduced in the source variants 52 weeks earlier. Consequently, PaReCo can be used to manage variability in “time” by automatically identifying interesting patches in later project releases to be backported to supported earlier releases.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"86 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72917306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3569444
Sumit Gulwani
{"title":"AI-assisted programming: applications, user experiences, and neuro-symbolic techniques (keynote)","authors":"Sumit Gulwani","doi":"10.1145/3540250.3569444","DOIUrl":"https://doi.org/10.1145/3540250.3569444","url":null,"abstract":"AI can enhance programming experiences for a diverse set of programmers: from professional developers and data scientists (proficient programmers) who need help in software engineering and data wrangling, all the way to spreadsheet users (low-code programmers) who need help in authoring formulas, and students (novice programmers) who seek hints when stuck with their programming homework. To communicate their need to AI, users can express their intent explicitly—as input-output examples or natural-language specification—or implicitly—where they encounter a bug (and expect AI to suggest a fix), or simply allow AI to observe their last few lines of code or edits (to have it suggest the next steps). The task of synthesizing an intended program snippet from the user’s intent is both a search and a ranking problem. Search is required to discover candidate programs that correspond to the (often ambiguous) intent, and ranking is required to pick the best program from multiple plausible alternatives. This creates a fertile playground for combining symbolic-reasoning techniques, which model the semantics of programming operators, and machine-learning techniques, which can model human preferences in programming. Recent advances in large language models like Codex offer further promise to advance such neuro-symbolic techniques. Finally, a few critical requirements in AI-assisted programming are usability, precision, and trust; and they create opportunities for innovative user experiences and interactivity paradigms. In this talk, I will explain these concepts using some existing successes, including the Flash Fill feature in Excel, Data Connectors in PowerQuery, and IntelliCode/CoPilot in Visual Studio. I will also describe several new opportunities in AI-assisted programming, which can drive the next set of foundational neuro-symbolic advances.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82434377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
软件产业与工程Pub Date : 2022-11-07DOI: 10.1145/3540250.3549160
Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Jiazhan Xie, Huanjun Xu, Yanjun Yang
{"title":"How to formulate specific how-to questions in software development?","authors":"Mingwei Liu, Xin Peng, Andrian Marcus, Christoph Treude, Jiazhan Xie, Huanjun Xu, Yanjun Yang","doi":"10.1145/3540250.3549160","DOIUrl":"https://doi.org/10.1145/3540250.3549160","url":null,"abstract":"Developers often ask how-to questions using search engines, technical Q&A communities, and interactive Q&A systems to seek help for specific programming tasks. However, they often do not formulate the questions in a specific way, making it hard for the systems to return the best answers. We propose an approach (TaskKG4Q) that interactively helps developers formulate a programming related how-to question. TaskKG4Q is using a programming task knowledge graph (task KG in short) mined from Stack Overflow questions, which provides a hierarchical conceptual structure for tasks in terms of [actions], [objects], and [constraints]. An empirical evaluation of the intrinsic quality of the task KG revealed that 75.0% of the annotated questions in the task KG are correct. The comparison between TaskKG4Q and two baselines revealed that TaskKG4Q can help developers formulate more specific how-to questions. More so, an empirical study with novice programmers revealed that they write more effective questions for finding answers to their programming tasks on Stack Overflow.","PeriodicalId":68155,"journal":{"name":"软件产业与工程","volume":"256 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76356601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}