Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh
{"title":"AWaRE2-MM: A Meta-Model for Goal-Driven, Contract-Mediated, Team-Centric Autonomous Middleware Frameworks for Antifragility","authors":"Anton V. Uzunov, Matthew Brennan, Mohan Baruwal Chhetri, Quoc Bao Vo, R. Kowalczyk, John Wondoh","doi":"10.1109/APSEC53868.2021.00066","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00066","url":null,"abstract":"In this paper, we introduce a new meta-model that captures core concepts for constructing software architectures for general-purpose, autonomous middleware frameworks that realize internalized and externalized self-adaptivity at both a system- and meta-level in order to achieve antifragility. The proposed meta-model builds on, specializes, and complements existing multi-agent meta-models in line with a previously published reference model for antifragile systems in the cyber domain.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122469773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probabilistic testing of asynchronously communicating systems","authors":"Puneet Bhateja","doi":"10.1109/APSEC53868.2021.00058","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00058","url":null,"abstract":"Input-output labelled transition system (IOLTS) is a state-based model that is widely used to describe the functional behaviour of a reactive system. However when the same system is observed asynchronously through a pair of unbounded FIFO queues (or channels), its apparent behaviour is different from its actual behaviour. This is because an execution trace of the system could appear distorted in a multitude of ways. The apparent behaviour is called the asynchronous behaviour of the system. It is well known that the asynchronous behaviour can also be described by an infinite-state IOLTS. This description however proves to be appropriate only as long as the channels are assumed to be reliable. The moment we throw in unreliability assumptions, the asynchronous behaviour becomes probabilistic in nature. The plain IOLTS model is simply not expressive enough to capture this probabilistic behaviour. To this end, we in this paper show how the asynchronous behaviour of a reactive system can be captured by Segala's probabilistic automata (SPA). We further show how the SPA expressing the asynchronous behaviour can serve as a reference model for probabilistic testing of asynchronously communicating systems.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133506420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma
{"title":"PGraph: A Graph-based Structure for Interactive Event Exploration on Social Media","authors":"Yang Yu, Minglai Shao, Hongyan Xu, Ying Sun, Wenjun Wang, Bofei Ma","doi":"10.1109/APSEC53868.2021.00015","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00015","url":null,"abstract":"Event detection is a common research topic in visualization. Existing methods always follow an exploration mode, where machine learning algorithms identify events and then analyze them via a visualization system. The detection process does not integrate the expert's experience. In this paper, we propose a novel framework that organizes the original dataset as an integrated graph that allows for Interactive Event Detection (IED) on the graph. Specifically, we formulate the problem Interactive Event Detection as subgraph detection on the graph under expert's interactions. Further, we define a flexible structure called PGraph to model the dataset and then propose an efficient algorithm that returns a subgraph as an event. Our proposed method supports performing various IED tasks under the expert's interactions. We evaluate the utility of our approach by applying it in two scenarios. One uses a social media dataset to study hot events; the other urban burglary dataset is used to detect consecutive burglary cases. Case studies show that our algorithm could detect more global events considering the expert's experience. By quantitative performance experiments, our method outperforms traditional machine detection approaches, especially in the social media dataset; our method's accuracy is higher than baselines at least 10%.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson
{"title":"On the Impact of ML use cases on Industrial Data Pipelines","authors":"M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson","doi":"10.1109/APSEC53868.2021.00053","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00053","url":null,"abstract":"The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116155022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Dynamic Visualization of Complex Reverse-Engineered Object Collaboration","authors":"Aki Hongo, Naoya Nitta","doi":"10.1109/APSEC53868.2021.00071","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00071","url":null,"abstract":"UML is useful to model a higher abstraction level concepts of the software in a forward engineering context, but it is still challenging to reverse engineer more complex behavior of realistic object-oriented programs (OOPs) based on such visualization techniques. For example in a sequence diagram, an object appears in quite different ways when it serves as a sender or receiver of some message and as a parameter or return value of another message, and thus compound method invocations such as invocation chains and callbacks cannot be represented directly. In this paper, first, we define a dynamic metrics named alternation complexity that indicates the number of alternations of object roles between sender/receiver and parameter/return value within collaboration. Through experiments with 12 professional programmers, we confirmed that the metrics captures a certain aspect of difficulty in comprehending features. Furthermore, we present a dynamic visualization model to directly represent collaboration where the types of object roles frequently change.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122172427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CHIS: A Novel Hybrid Granularity Identifier Splitting Approach","authors":"Siyuan Liu, Jingxuan Zhang, Jiahui Liang, Junpeng Luo, Yong Xu, Chenxing Sun","doi":"10.1109/APSEC53868.2021.00027","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00027","url":null,"abstract":"Information Retrieval (IR) techniques have been widely utilized by a growing number of software maintenance activities. However, there is a mismatch between source code lexicon (especially identifiers) and vocabulary in software artifacts, leading to the inefficiency of IR techniques. Consequently, it is essential to normalize identifiers, whose aim is to parse identifiers into several natural language terms. Identifier splitting significantly impacts on the effectiveness of identifier normalization. Even though researchers have proposed several approaches to split identifiers, three main drawbacks remain to be resolved, including without considering morphemes, over-splitting, and under-splitting. In this paper, we propose a new Character-level Hybrid-granularity Identifier Splitting approach CHIS to resolve the three drawbacks and better split identifiers. CHIS combines the Bidirectional Encoder Representation from Transformers (BERT) and Conditional Random Fields (CRF) to train a deep learning model to split identifiers. In addition, CHIS further employs a pre-processing component and a post-processing component to resolve the morpheme acquisition drawback and the over-splitting as well as the under-splitting drawbacks respectively, thus further improving its performance. Specifically, in the pre-processing component, CHIS obtains and labels the most frequent subwords of the training identifiers as morphemes through the Byte Pair Encoding (BPE) algorithm and the sequence labeling algorithm. In the post-processing component, CHIS iteratively merges and splits the splitting results obtained by the deep learning model to resolve the over-splitting and under-splitting drawbacks. We conduct extensive experiments to show the effectiveness of CHIS. Experimental results show that CHIS achieves the Accuracy of 0.943 on average and outperforms the state-of-the-art approach by 0.085 on average. In addition, the effectiveness of the pre-processing and post-processing components of CHIS are also validated.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124197305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of Software Architecture for Neural Network Cooperation: Case of Forgery Detection","authors":"Akira Mizutani, Masami Noro, Atsushi Sawada","doi":"10.1109/APSEC53868.2021.00021","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00021","url":null,"abstract":"Recent technological advances in media tampering has been the cause of many harmful forged images. Tampering detection methods became major research topics to cope with it in the neural network community. The methods almost always aim at detecting a specific forgery. That is, a general detecting method to find any tampering has not been invented so far. This paper concerns about a software architecture for organizing multiple neural networks to detect multiple kinds of forgeries. The key issue here is to construct, from the meta-level, a mechanism for an ensemble of front-end neural networks to select a neural network which makes a decision. Under this architecture, we implemented a prototype for detecting forged images resulted from multiple tampering methods of copy-move and compression. In order to demonstrate that our architecture works well, we examined a case study with a total of 120,000 patches which consist of three classes of copy-move, compression and untampered data, 40,000 patches for each. The result shows our proposed method successfully classified 108,954 out of 120,000 patches with 90.82 % accuracy. We also give discussions on our architectural implication to avoid concept drift. Our architecture is designed to be a context-oriented and meta-level, which has a two-layered structure: meta and base. The neural networks can be categorized into base-level components, whereas a component coordinating the networks is addressed in meta-level. The architecture explains that the concept drift can be handled in the meta-level. Through the discussions on the techniques of transfer learning, online learning, and ensemble learning in terms of the architecture we constructed, it is concluded that we could construct a universal architecture to coordinate machine learning components.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126273639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S2 LMMD: Cross-Project Software Defect Prediction via Statement Semantic Learning and Maximum Mean Discrepancy","authors":"Wangshu Liu, Yongteng Zhu, Xiang Chen, Qing Gu, Xingya Wang, Shenkai Gu","doi":"10.1109/APSEC53868.2021.00044","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00044","url":null,"abstract":"Different from within-project software defect prediction (WPDP), cross-project software defect prediction (CPDP) does not require sufficient training data and can help developers in the early stages of software development. Recent studies tried to learn semantic features for CPDP by feeding neural networks with abstract syntax tree (AST) token vectors. However, the ASTs directly parsed from software modules usually have complex structures, which are reflected on more nodes and deeper size, and the transfer learning is not regularly adopted to further reduce the data distribution difference between the source project and the target project. To solve these problems, we aim to joint learn the statement level trees (SLT) and alleviate data distribution difference with maximum mean discrepancy (MMD) to improve defect prediction performance on CPDP. Specifically, we propose a novel cross-project defect prediction method S2LMMD via statement semantic learning and MMD. We first construct the SLT by splitting the original AST on specified node. Then we generate more effective semantic features by learning of sequence embedding with Bi-GRU neural network. Finally, a transfer loss MMD is carried out to keep more common characteristics across different project datasets to further improve CPDP performance. To verify the effectiveness of our proposed method, we conducted experiments on ten widely used open-source projects and evaluated the experimental performance by using AUC measures. Our empirical results show that our proposed method S2LMMD can significantly outperform eight state-of-the-art baselines. In addition, for semantic learning, SLT has a higher influence on CPDP, while MMD is of great significance in transfer learning.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125891593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding repeated strings in code repositories and its applications to code-clone detection","authors":"Yoriyuki Yamagata, Fabien Hervé, Yuji Fujiwara, Katsuro Inoue","doi":"10.1109/APSEC53868.2021.00057","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00057","url":null,"abstract":"Although researchers have created many advanced code-clone detection techniques, more effort is required to realize wide adaptation of these techniques in the industry. One of the reasons behind this is the reliance of these advanced techniques on lexing and parsing programs. Modern programming languages have complex lexical conventions and grammar, which evolve constantly. Therefore, using advanced code-clone detection techniques requires substantial and continuous effort. This paper proposes a lightweight language-independent method to detect code clones by simply finding repeated strings in a code repository, relying on neither lexing nor parsing. The proposed method is based on an efficient technique developed in a bio-informatics context to find repeated strings. We refer to the repeated strings in the source-code as weak Type-1 clones. Because the proposed technique normalizes newlines, tabs, and white spaces into a single white space, it can find clones in which newline positions or indentations are changed, as often in the case when copy-pasting occurs. Although the proposed method only finds verbatim copies, it also makes interesting observations regarding repository structures. Many developers may prefer the proposed simple approach because it is easier to understand than other advanced techniques that use heuristics, approximation, and machine learning.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129857211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable Fault Detection Based on Precise Access Path","authors":"Chi Li, Yuexing Wang, Min Zhou, M. Gu","doi":"10.1109/APSEC53868.2021.00054","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00054","url":null,"abstract":"Precise static analysis is necessary for an industrial environment to ensure reliability and security, which is usually field-sensitive and inter-procedural. However, it faces the problem of insufficient scale capability when being applied to various industrial environments: (1) Field-sensitive analysis can not assure termination if field accesses are modeled by unbounded access paths; (2) Inter-procedural analysis may lead to path explosion problems because of the unbounded length of call chains. While using longer access paths or call chains can improve precision, the analysis may have poor performance in terms of efficiency. Specifically, an industry-strength method should be scalable enough to face different applications. This paper presents a scalable fault detection method based on the precise access path. Precise access path models a memory location with accurate operations and offsets from a source. Points-to relations of variables are used to refine it. It can differentiate elements of aggregate structures and is more precise than the ordinary access path. Based on the precise access path, we perform an inter-procedural analysis with the help of an intra-procedural analysis and combined function summary. Furthermore, our method is designed backward to detect error handling bugs. Compared with the state-of-the-art tools, our method is more scalable, with higher precision and efficiency on both benchmarks and 11 widely-used applications.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130058615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}