Guest Editorial: Machine learning applied to quality and security in software systems

IF 1.3 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

IET Software Pub Date : 2023-07-25 DOI:10.1049/sfw2.12141

Honghao Gao, Walayat Hussain, Ramón J. Durán Barroso, Junaid Arshad, Yuyu Yin

{"title":"Guest Editorial: Machine learning applied to quality and security in software systems","authors":"Honghao Gao, Walayat Hussain, Ramón J. Durán Barroso, Junaid Arshad, Yuyu Yin","doi":"10.1049/sfw2.12141","DOIUrl":null,"url":null,"abstract":"During the development of software systems, even with advanced planning, problems with quality and security occur. These defects may result in threats to program development and maintenance. Therefore, to control and minimise these defects, machine learning can be used to improve the quality and security of software systems. This special issue focuses on recent advances in architecture, algorithms, optimisation, and models for machine learning applied to quality and security in software systems. After a rigorous review according to relevance, originality, technical novelties, and presentation quality, we selected 4 manuscripts. A summary of these accepted papers is outlined below.In the first paper entitled “Robust Malware Identification via Deep Temporal Convolutional Network with Symmetric Cross Entropy Learning” by Sun et al., the authors propose a robust Malware identification method using the temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface (API) function names in many cases. Here, considering the numerous unlabelled samples in practical intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, word2vec. In the experiments, the proposed method is compared to several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of the proposed method, that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.In the second paper entitled “Just-In-Time Defect Prediction Enhanced by the Joint Method of Line Label Fusion and File Filtering” by Zhang et al., the authors propose a Just-in-Time defect prediction model enhanced by the joint method of line label Fusion and file Filtering (JIT-FF). First, to distinguish added and removed lines while preserving the original software changes information, the authors represent the code changes as original, added, and removed codes according to line labels. Second, to obtain semantics-enhanced code representation, the authors propose a cross-attention-based line label fusion method to perform complementary feature enhancement. Third, to generate code changes containing fewer defect-irrelevant files, the authors formalise the file filtering as a sequential decision problem and propose a reinforcement learning-based file filtering method. Finally, based on generated code changes, CodeBERT-based commit representation and multi-layer perceptron-based defect prediction are performed to identify the defective software changes. The experiments demonstrate that JIT-FF predicts defective software changes more effectively.In the third paper entitled “Android Malware Detection via Efficient API Call Sequences Extraction and Machine Learning Classifiers” by Wang et al., the authors propose a novel Android malware detection framework, where the authors contribute an efficient API call sequences extraction algorithm and an investigation of different types of classifiers. In API call sequences extraction, the authors propose an algorithm for transforming the function call graph from a multigraph into a directed simple graph, which successfully avoids unnecessary repetitive path searching. The authors also propose a pruning search, which further reduces the number of paths to be searched. The developed algorithm greatly reduces the time complexity. The authors generate the transition matrix as classification features and investigate three types of machine learning classifiers to complete the malware detection task. The experiments are performed on real-world APKs, and the results demonstrate that the proposed method reduces the running time and produces high detection accuracy.In the fourth paper entitled “Selecting Reliable Blockchain Peers via Hybrid Blockchain Reliability Prediction” by Zheng et al., the authors propose H-BRP, a Hybrid Blockchain Reliability Prediction model, to extract the blockchain reliability factors and then make the personalised prediction for each user. Connecting to unreliable blockchain peers is prone to resource waste and even loss of cryptocurrencies by repeated transactions. The proposed model primarily aims to select reliable blockchain peers and to evaluate and predict their reliability. Comprehensive experiments conducted on 100 blockchain requesters and 200 blockchain peers demonstrate the effectiveness of the proposed H-BRP model. Furthermore, the implementation and dataset of 2,000,000 test cases are released.The Guest Editors would like to express their deep gratitude to all the authors who have submitted their valuable contributions, and to the numerous and highly qualified anonymous reviewers. We think that the selected contributions, which represent the current state of the art in the field, will be of great interest to the community. We also would like to thank the IET Software publication staff members for their continuous support and dedication. We particularly appreciate the relentless support and encouragement granted to us by Prof. Hana Chockler, the Editor-in-Chief of IET Software.","PeriodicalId":50378,"journal":{"name":"IET Software","volume":"17 4","pages":"345-347"},"PeriodicalIF":1.3000,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/sfw2.12141","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Software","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/sfw2.12141","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

During the development of software systems, even with advanced planning, problems with quality and security occur. These defects may result in threats to program development and maintenance. Therefore, to control and minimise these defects, machine learning can be used to improve the quality and security of software systems. This special issue focuses on recent advances in architecture, algorithms, optimisation, and models for machine learning applied to quality and security in software systems. After a rigorous review according to relevance, originality, technical novelties, and presentation quality, we selected 4 manuscripts. A summary of these accepted papers is outlined below.

In the first paper entitled “Robust Malware Identification via Deep Temporal Convolutional Network with Symmetric Cross Entropy Learning” by Sun et al., the authors propose a robust Malware identification method using the temporal convolutional network (TCN). Moreover, word embedding techniques are generally utilised to understand the contextual relationship between the input operation code (opcode) and application programming interface (API) function names in many cases. Here, considering the numerous unlabelled samples in practical intelligent environments, the authors pre-train the TCN model on an unlabelled set using a word embedding method, that is, word2vec. In the experiments, the proposed method is compared to several traditional statistical methods and more recent neural networks on a synthetic Malware dataset and a real-world dataset. The performance comparisons demonstrate the better performance and noise robustness of the proposed method, that the proposed method can yield the best identification accuracy of 98.75% in real-world scenarios.

In the second paper entitled “Just-In-Time Defect Prediction Enhanced by the Joint Method of Line Label Fusion and File Filtering” by Zhang et al., the authors propose a Just-in-Time defect prediction model enhanced by the joint method of line label Fusion and file Filtering (JIT-FF). First, to distinguish added and removed lines while preserving the original software changes information, the authors represent the code changes as original, added, and removed codes according to line labels. Second, to obtain semantics-enhanced code representation, the authors propose a cross-attention-based line label fusion method to perform complementary feature enhancement. Third, to generate code changes containing fewer defect-irrelevant files, the authors formalise the file filtering as a sequential decision problem and propose a reinforcement learning-based file filtering method. Finally, based on generated code changes, CodeBERT-based commit representation and multi-layer perceptron-based defect prediction are performed to identify the defective software changes. The experiments demonstrate that JIT-FF predicts defective software changes more effectively.

In the third paper entitled “Android Malware Detection via Efficient API Call Sequences Extraction and Machine Learning Classifiers” by Wang et al., the authors propose a novel Android malware detection framework, where the authors contribute an efficient API call sequences extraction algorithm and an investigation of different types of classifiers. In API call sequences extraction, the authors propose an algorithm for transforming the function call graph from a multigraph into a directed simple graph, which successfully avoids unnecessary repetitive path searching. The authors also propose a pruning search, which further reduces the number of paths to be searched. The developed algorithm greatly reduces the time complexity. The authors generate the transition matrix as classification features and investigate three types of machine learning classifiers to complete the malware detection task. The experiments are performed on real-world APKs, and the results demonstrate that the proposed method reduces the running time and produces high detection accuracy.

In the fourth paper entitled “Selecting Reliable Blockchain Peers via Hybrid Blockchain Reliability Prediction” by Zheng et al., the authors propose H-BRP, a Hybrid Blockchain Reliability Prediction model, to extract the blockchain reliability factors and then make the personalised prediction for each user. Connecting to unreliable blockchain peers is prone to resource waste and even loss of cryptocurrencies by repeated transactions. The proposed model primarily aims to select reliable blockchain peers and to evaluate and predict their reliability. Comprehensive experiments conducted on 100 blockchain requesters and 200 blockchain peers demonstrate the effectiveness of the proposed H-BRP model. Furthermore, the implementation and dataset of 2,000,000 test cases are released.

The Guest Editors would like to express their deep gratitude to all the authors who have submitted their valuable contributions, and to the numerous and highly qualified anonymous reviewers. We think that the selected contributions, which represent the current state of the art in the field, will be of great interest to the community. We also would like to thank the IET Software publication staff members for their continuous support and dedication. We particularly appreciate the relentless support and encouragement granted to us by Prof. Hana Chockler, the Editor-in-Chief of IET Software.

查看原文本刊更多论文

嘉宾评论:机器学习应用于软件系统的质量和安全

在软件系统的开发过程中，即使有先进的计划，也会出现质量和安全问题。这些缺陷可能会对程序开发和维护造成威胁。因此，为了控制和最小化这些缺陷，机器学习可以用来提高软件系统的质量和安全性。本期特刊重点介绍了应用于软件系统质量和安全的机器学习的体系结构、算法、优化和模型的最新进展。在经过相关性、原创性、技术新颖性和表达质量的严格审查后，我们选择了4篇稿件。下面概述了这些被接受的论文的摘要。在Sun等人发表的第一篇题为“基于对称交叉熵学习的深度时间卷积网络的鲁棒恶意软件识别”的论文中，作者提出了一种使用时间卷积网络(TCN)的鲁棒恶意软件识别方法。此外，在许多情况下，词嵌入技术通常用于理解输入操作码(opcode)和应用程序编程接口(API)函数名之间的上下文关系。在这里，考虑到实际智能环境中大量的未标记样本，作者使用一种词嵌入方法，即word2vec，在一个未标记的集合上预训练TCN模型。在一个合成的恶意软件数据集和一个真实的数据集上，将该方法与几种传统的统计方法和最新的神经网络进行了比较。性能比较表明，该方法具有较好的性能和噪声鲁棒性，在实际场景中，该方法的识别准确率达到了98.75%。在Zhang等人的第二篇文章《线标签融合和文件过滤联合方法增强的即时缺陷预测》中，作者提出了一种线标签融合和文件过滤联合方法增强的即时缺陷预测模型(JIT-FF)。首先，为了在保留原始软件变更信息的同时区分添加和删除的行，作者根据行标签将代码变更表示为原始、添加和删除的代码。其次，为了获得语义增强的代码表示，作者提出了一种基于交叉注意的线标签融合方法来进行互补特征增强。第三，为了生成包含较少缺陷无关文件的代码更改，作者将文件过滤形式化为顺序决策问题，并提出了一种基于强化学习的文件过滤方法。最后，基于生成的代码变更，采用基于codebert的提交表示和基于多层感知器的缺陷预测来识别有缺陷的软件变更。实验表明，JIT-FF能更有效地预测软件缺陷变更。在Wang等人的第三篇论文《通过高效API调用序列提取和机器学习分类器检测Android恶意软件》中，作者提出了一种新的Android恶意软件检测框架，其中作者提供了一种高效的API调用序列提取算法和对不同类型分类器的研究。在API调用序列提取中，作者提出了一种将函数调用图从多图转换为有向简单图的算法，成功地避免了不必要的重复路径搜索。作者还提出了一种修剪搜索，这进一步减少了要搜索的路径数量。该算法大大降低了时间复杂度。作者生成转移矩阵作为分类特征，并研究了三种类型的机器学习分类器来完成恶意软件检测任务。在实际的apk上进行了实验，结果表明，该方法减少了运行时间，具有较高的检测精度。在第四篇论文“通过混合区块链可靠性预测选择可靠的区块链节点”中，作者郑等人提出了混合区块链可靠性预测模型H-BRP，提取区块链可靠性因素，然后对每个用户进行个性化预测。连接到不可靠的区块链节点容易造成资源浪费，甚至由于重复交易而丢失加密货币。提出的模型主要旨在选择可靠的区块链节点，并评估和预测它们的可靠性。在100个区块链请求者和200个区块链对等体上进行的综合实验证明了所提出的H-BRP模型的有效性。此外，还发布了200万个测试用例的实现和数据集。特邀编辑谨向所有贡献宝贵意见的作者和众多高素质的匿名审稿人表示衷心的感谢。我们认为所选的贡献代表了该领域的最新技术，将引起社区的极大兴趣。我们也要感谢IET软件出版工作人员一直以来的支持和奉献。我们特别感谢IET Software总编辑Hana Chockler教授给予我们的支持和鼓励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Software 工程技术-计算机：软件工程

CiteScore

4.20

自引率

0.00%

发文量

审稿时长

9 months

期刊介绍： IET Software publishes papers on all aspects of the software lifecycle, including design, development, implementation and maintenance. The focus of the journal is on the methods used to develop and maintain software, and their practical application. Authors are especially encouraged to submit papers on the following topics, although papers on all aspects of software engineering are welcome: Software and systems requirements engineering Formal methods, design methods, practice and experience Software architecture, aspect and object orientation, reuse and re-engineering Testing, verification and validation techniques Software dependability and measurement Human systems engineering and human-computer interaction Knowledge engineering; expert and knowledge-based systems, intelligent agents Information systems engineering Application of software engineering in industry and commerce Software engineering technology transfer Management of software development Theoretical aspects of software development Machine learning Big data and big code Cloud computing Current Special Issue. Call for papers: Knowledge Discovery for Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_KDSD.pdf Big Data Analytics for Sustainable Software Development - https://digital-library.theiet.org/files/IET_SEN_CFP_BDASSD.pdf