2021 28th Asia-Pacific Software Engineering Conference (APSEC)最新文献_第6页

Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection Android恶意软件检测中少数派过采样技术的实证评价

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00042

Lwin Khin Shar, T. Duong, D. Lo

{"title":"Empirical Evaluation of Minority Oversampling Techniques in the Context of Android Malware Detection","authors":"Lwin Khin Shar, T. Duong, D. Lo","doi":"10.1109/APSEC53868.2021.00042","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00042","url":null,"abstract":"In Android malware classification, the distribution of training data among classes is often imbalanced. This causes the learning algorithm to bias towards the dominant classes, resulting in mis-classification of minority classes. One effective way to improve the performance of classifiers is the synthetic generation of minority instances. One pioneer technique in this area is Synthetic Minority Oversampling Technique (SMOTE) and since its publication in 2002, several variants of SMOTE have been proposed and evaluated on various imbalanced datasets. However, these techniques have not been evaluated in the context of Android malware detection. Studies have shown that the performance of SMOTE and its variants can vary across different application domains. In this paper, we conduct a large scale empirical evaluation of SMOTE and its variants on six different datasets that reflect six types of features commonly used in Android malware detection. The datasets are extracted from a benchmark of 4,572 benign apps and 2,399 malicious Android apps, used in our previous study. Through extensive experiments, we set a new baseline in the field of Android malware detection, and provide guidance to practitioners on the application of different SMOTE variants to Android malware detection.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"312 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115945074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Applying Multi-Objective Genetic Algorithm for Efficient Selection on Program Generation 应用多目标遗传算法在程序生成中的高效选择

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00060

Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno

{"title":"Applying Multi-Objective Genetic Algorithm for Efficient Selection on Program Generation","authors":"Hiroto Watanabe, S. Matsumoto, Yoshiki Higo, S. Kusumoto, Toshiyuki Kurabayashi, Hiroyuki Kirinuki, Haruto Tanno","doi":"10.1109/APSEC53868.2021.00060","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00060","url":null,"abstract":"Automated program generation (APG) is a concept of automatically making a computer program. Toward this goal, transferring automated program repair (APR) to APG can be considered. APR modifies the buggy input source code to pass all test cases. APG regards empty source code as initially failing all test cases, i.e., containing multiple bugs. Search-based APR repeatedly generates program variants and evaluates them. Many traditional APR systems evaluate the fitness of variants based on the number of passing test cases. However, when source code contains multiple bugs, this fitness function lacks the expressive power of variants. In this paper, we propose the application of a multi-objective genetic algorithm to APR in order to improve efficiency. We also propose a new crossover method that combines two variants with complementary test results, taking advantage of the high expressive power of multi-objective genetic algorithms for evaluation. We tested the effectiveness of the proposed method on competitive programming tasks. The obtained results showed significant differences in the number of successful trials and the required generation time.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130892994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PyTraceBugs: A Large Python Code Dataset for Supervised Machine Learning in Software Defect Prediction PyTraceBugs:用于软件缺陷预测中监督机器学习的大型Python代码数据集

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00022

E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov

{"title":"PyTraceBugs: A Large Python Code Dataset for Supervised Machine Learning in Software Defect Prediction","authors":"E. Akimova, A. Bersenev, Artem A. Deikov, Konstantin S. Kobylkin, A. Konygin, I. Mezentsev, V. Misilov","doi":"10.1109/APSEC53868.2021.00022","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00022","url":null,"abstract":"Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96 % for the buggy source code and 61 % for the correct source code whereas recall is 34 % and 99 % respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132352759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Runtime models and evolution graphs for the version management of microservice architectures 微服务架构版本管理的运行时模型和进化图

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00064

Yuwei Wang, D. Conan, S. Chabridon, Kavoos Bojnourdi, Jingxua Ma

引用次数: 1

Smart Contract Vulnerability Detection Using Code Representation Fusion 基于代码表示融合的智能合约漏洞检测

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00069

Ben Wang, Hanting Chu, Pengcheng Zhang, Hai Dong

引用次数: 3

A Learning-to-Rank Based Approach for Improving Regression Test Case Prioritization 一种基于学习排序的改进回归测试用例优先级的方法

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00075

Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara

{"title":"A Learning-to-Rank Based Approach for Improving Regression Test Case Prioritization","authors":"Chu-Ti Lin, Sheng-Hsiang Yuan, Jutarporn Intasara","doi":"10.1109/APSEC53868.2021.00075","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00075","url":null,"abstract":"Many prior studies with attempt to improve regression testing adopt test case prioritization (TCP). TCP generally arranges the execution of regression test cases according to specific rules with the goal of revealing faults as early as possible. It is noted that different TCP algorithms adopt different metrics to evaluate test cases' priority so that they may be effect at revealing faults early in different faulty programs. Adopting a single metric may not generally work well. In this decade, learning-to-rank (LTR) strategies have been adopted to address some software engineering problems. This study also uses a pairwise LTR strategy XGBoost to combine several existing metrics so as to improve TCP effectiveness. More specifically, we regard the metrics adopted by TCP techniques to evaluate test cases' priority as the features of the training data and adopt XGBoost to learn the weights of the combined metrics. Additionally, in order to avoid overfitting, we use a fuzzy inference system to generate additional features for data augmentation. The experimental results show that our approach achieves more excellent effectiveness than the existing TCP techniques with respect to the selected subject programs.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115227806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NeoMycelia: A software reference architecturefor big data systems NeoMycelia:大数据系统的软件参考架构

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00052

Pouya Ataei, A. Litchfield

引用次数: 6

Interaction Modelling for IoT 物联网交互建模

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00020

Jessica Turner, Judy Bowen, Nikki van Zandwijk

引用次数: 2

Degree doesn't Matter: Identifying the Drivers of Interaction in Software Development Ecosystems 程度无关紧要:识别软件开发生态系统中交互的驱动因素

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00048

I. Bardhan, Subhajit Datta, S. Majumder

引用次数: 0

TraceRefiner: An Automated Technique for Refining Coarse-Grained Requirement-to-Class Traces TraceRefiner:一种用于细化粗粒度需求到类跟踪的自动化技术

2021 28th Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2021-12-01 DOI: 10.1109/APSEC53868.2021.00009

Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed

{"title":"TraceRefiner: An Automated Technique for Refining Coarse-Grained Requirement-to-Class Traces","authors":"Mouna Hammoudi, Christoph Mayr-Dorn, A. Mashkoor, Alexander Egyed","doi":"10.1109/APSEC53868.2021.00009","DOIUrl":"https://doi.org/10.1109/APSEC53868.2021.00009","url":null,"abstract":"Requirement-to-code traces reveal the code location(s) where a requirement is implemented. Traceability is essential for code evolution and understanding. However, creating and maintaining requirement-to-code traces is a tedious and costly process. In this paper, we introduce TraceRefiner, a novel technique for automatically refining coarse-grained requirement-to-class traces to fine-grained requirement-to-method traces. The inputs of TraceRefiner are (1) the set of requirement-to-class traces, which are easier to create as there are far fewer traces to capture, and (2) information about the code structure (i.e., method calls). The output of TraceRefiner is the set of requirement-to-method traces (providing additional, fine-grained information to the developer). We demonstrate the quality of TraceRefiner on four case study systems (7-72KLOC) and evaluated it on over 230,000 requirement-to-method predictions. The evaluation demonstrates TraceRefiner's ability to refine traces even if many requirement-to-class traces are undefined (incomplete input). The obtained results show that the proposed technique is fully automated, tool-supported, and scalable.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133306477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1