{"title":"A systematic literature review on software security testing using metaheuristics","authors":"Fatma Ahsan, Faisal Anwer","doi":"10.1007/s10515-024-00433-0","DOIUrl":"10.1007/s10515-024-00433-0","url":null,"abstract":"<div><p>The security of an application is critical for its success, as breaches cause loss for organizations and individuals. Search-based software security testing (SBSST) is the field that utilizes metaheuristics to generate test cases for the software testing for some pre-specified security test adequacy criteria This paper conducts a systematic literature review to compare metaheuristics and fitness functions used in software security testing, exploring their distinctive capabilities and impact on vulnerability detection and code coverage. The aim is to provide insights for fortifying software systems against emerging threats in the rapidly evolving technological landscape. This paper examines how search-based algorithms have been explored in the context of code coverage and software security testing. Moreover, the study highlights different metaheuristics and fitness functions for security testing and code coverage. This paper follows the standard guidelines from Kitchenham to conduct SLR and obtained 122 primary studies related to SBSST after a multi-stage selection process. The papers were from different sources journals, conference proceedings, workshops, summits, and researchers’ webpages published between 2001 and 2022. The outcomes demonstrate that the main tackled vulnerabilities using metaheuristics are XSS, SQLI, program crash, and XMLI. The findings have suggested several areas for future research directions, including detecting server-side request forgery and security testing of third-party components. Moreover, new metaheuristics must also need to be explored to detect security vulnerabilities that are still unexplored or explored significantly less. Furthermore, metaheuristics can be combined with machine learning and reinforcement learning techniques for better results. Some metaheuristics can be designed by looking at the complexity of security testing and exploiting more fitness functions related to detecting different vulnerabilities.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141107156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel automated framework for fine-grained sentiment analysis of application reviews using deep neural networks","authors":"Haochen Zou, Yongli Wang","doi":"10.1007/s10515-024-00444-x","DOIUrl":"10.1007/s10515-024-00444-x","url":null,"abstract":"<div><p>The substantial volume of user feedback contained in application reviews significantly contributes to the development of human-centred software requirement engineering. The abundance of unstructured text data necessitates an automated analytical framework for decision-making. Language models can automatically extract fine-grained aspect-based sentiment information from application reviews. Existing approaches are constructed based on the general domain corpus, and are challenging to elucidate the internal technique of the recognition process, along with the factors contributing to the analysis results. To fully utilize software engineering domain-specific knowledge and accurately identify aspect-sentiment pairs from application reviews, we design a dependency-enhanced heterogeneous graph neural networks architecture based on the dual-level attention mechanism. The heterogeneous information network with knowledge resources from the software engineering field is embedded into graph convolutional networks to consider the attribute characteristics of different node types. The relationship between aspect terms and sentiment terms in application reviews is determined by adjusting the dual-level attention mechanism. Semantic dependency enhancement is introduced to comprehensively model contextual relationships and analyze sentence structure, thereby distinguishing important contextual information. To our knowledge, this marks initial efforts to leverage software engineering domain knowledge resources to deep neural networks to address fine-grained sentiment analysis issues. The experimental results on multiple public benchmark datasets indicate the effectiveness of the proposed automated framework in aspect-based sentiment analysis tasks for application reviews.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140968914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Somayeh Kalhor, Mohammad Reza Keyvanpour, Afshin Salajegheh
{"title":"A systematic review of refactoring opportunities by software antipattern detection","authors":"Somayeh Kalhor, Mohammad Reza Keyvanpour, Afshin Salajegheh","doi":"10.1007/s10515-024-00443-y","DOIUrl":"10.1007/s10515-024-00443-y","url":null,"abstract":"<div><p>The violation of the semantic and structural software principles, such as low connection, high coherence, high understanding, and others, are called anti-patterns, which is one of the concerns of the software development process. They are caused due to bad design or programming that must be detected and removed to improve the application’s source code. Refactoring operators efficiently eliminate antipatterns, but they must first be identified. Therefore, antipattern detection is a critical issue in software engineering, and to do this, various approaches have been proposed. So far, review articles have been published to classify and compare these approaches. However, a comprehensive study using evaluation parameters has not compared different anti-pattern detection methods at all software abstraction levels. In this article, all the methods presented so far are classified, then their advantages and disadvantages are highlighted. Finally, a complete comparison of each category by evaluation metrics is provided. Our proposed classification considers three aspects, levels of abstraction, degree of dependence on developers’ skills, and techniques used. Then, the evaluation metrics reported on this subject are analyzed, and the qualitative values of these metrics for each category are presented. This information can help researchers compare and understand existing methods and improve them.\u0000</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Extracting high-level activities from low-level program execution logs","authors":"Evgenii V. Stepanov, Alexey A. Mitsyuk","doi":"10.1007/s10515-024-00441-0","DOIUrl":"10.1007/s10515-024-00441-0","url":null,"abstract":"<div><p>Modern runtime environments, standard libraries, and other frameworks provide many ways of diagnostics for software engineers. One form of such diagnostics is logging low-level events which characterize internal processes during program execution like garbage collection, assembly loading, just-in-time compilation, etc. Low-level program execution event logs contain a large number of events and event classes, which makes it impossible to discover meaningful process models straight from the event log, so extraction of high-level activities is a necessary step for further processing of such logs. In this paper, .NET applications execution logs are considered and an approach based on an unsupervised technique is extended with the domain-driven hierarchy built with the knowledge of a structure of logged events. The proposed approach allows treating events on different levels of abstraction, thus extending the number of patterns and activities found with the unsupervised technique. Experiments with real-life .NET programs execution event logs are conducted to demonstrate the proposed approach’s capability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cho Xuan Do, Nguyen Trong Luu, Phuong Thi Lan Nguyen
{"title":"Optimizing software vulnerability detection using RoBERTa and machine learning","authors":"Cho Xuan Do, Nguyen Trong Luu, Phuong Thi Lan Nguyen","doi":"10.1007/s10515-024-00440-1","DOIUrl":"10.1007/s10515-024-00440-1","url":null,"abstract":"<div><p>Detecting vulnerabilities in source code written in C and C + + is currently essential as attack techniques against systems seek to find, exploit, and attack these vulnerabilities. In this article, to improve the effectiveness of the source code vulnerability detection process, we propose a new approach based on building and representing source code features using natural language processing (NLP) techniques. Our proposal in the article consists of two main stages: (i) building a feature profile of the source code using the RoBERTa model, and (ii) classifying source code based on the feature profile using a supervised machine learning algorithm. Specifically, with our proposal utilizing the pre-trained RoBERTa model, we have successfully built and represented important features of source code as complete vectors, thereby enhancing the effectiveness of prediction and vulnerability detection models. The experimental part of our article compared and evaluated our proposal with other approaches on the FFmpeg + Qume dataset. The experimental results in the article showed that the approach in this study was superior to other research directions on all measures. Therefore, the proposal to use NLP techniques based on the RoBERTa model not only has scientific significance as a new research direction that has not been proposed for application but also has practical significance when all experimental results are highly effective.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140925556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Youcong Ni, Xin Du, Yuan Yuan, Ruliang Xiao, Gaolin Chen
{"title":"Tsoa: a two-stage optimization approach for GCC compilation options to minimize execution time","authors":"Youcong Ni, Xin Du, Yuan Yuan, Ruliang Xiao, Gaolin Chen","doi":"10.1007/s10515-024-00437-w","DOIUrl":"10.1007/s10515-024-00437-w","url":null,"abstract":"<div><p>The open-source compiler GCC offers numerous options to improve execution time. Two categories of approaches, machine learning-based and design space exploration, have emerged for selecting the optimal set of options. However, they continue to face challenge in quickly obtaining high-quality solutions due to the large and discrete optimization space, time-consuming utility evaluation for selected options, and complex interactions among options. To address these challenges, we propose TSOA, a Two-Stage Optimization Approach for GCC compilation options to minimize execution time. In the first stage, we present OPPM, an Option Preselection algorithm based on Pattern Mining. OPPM generates diverse samples to cover a wide range of option interactions. It subsequently mines frequent options from both objective-improved and non-improved samples. The mining results are further validated using CRC codes to precisely preselect options and reduce the optimization space. Transitioning to the second stage, we present OSEA, an Option Selection Evolutionary optimization Algorithm. OSEA is grounded in solution preselection and an option interaction graph. The solution preselection employs a random forest to build a classifier, efficiently identifying promising solutions for the next-generation population and thereby reducing the time spent on utility evaluation. Simultaneously, the option interaction graph is built to capture option interplays and their influence on objectives from evaluated solutions. Then, high-quality solutions are generated based on the option interaction graph. We evaluate the performance of TSOA by comparing it with representative machine learning-based and design space exploration approaches across a diverse set of 20 problem instances from two benchmark platforms. Additionally, we validate the effectiveness of OPPM and conduct related ablation experiments. The experimental results show that TSOA outperforms state-of-the-art approaches significantly in both optimization time and solution quality. Moreover, OPPM outperforms other option preselection algorithms, while the effectiveness of random forest-assisted solution preselection, along with new solution generation based on the option interaction graph, has been verified.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140659966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ProRLearn: boosting prompt tuning-based vulnerability detection by reinforcement learning","authors":"Zilong Ren, Xiaolin Ju, Xiang Chen, Hao Shen","doi":"10.1007/s10515-024-00438-9","DOIUrl":"10.1007/s10515-024-00438-9","url":null,"abstract":"<div><p>Software vulnerability detection is a critical step in ensuring system security and data protection. Recent research has demonstrated the effectiveness of deep learning in automated vulnerability detection. However, it is difficult for deep learning models to understand the semantics and domain-specific knowledge of source code. In this study, we introduce a new vulnerability detection framework, ProRLearn, which leverages two main techniques: prompt tuning and reinforcement learning. Since existing fine-tuning of pre-trained language models (PLMs) struggles to leverage domain knowledge fully, we introduce a new automatic prompt-tuning technique. Precisely, prompt tuning mimics the pre-training process of PLMs by rephrasing task input and adding prompts, using the PLM’s output as the prediction output. The introduction of the reinforcement learning reward mechanism aims to guide the behavior of vulnerability detection through a reward and punishment model, enabling it to learn effective strategies for obtaining maximum long-term rewards in specific environments. The introduction of reinforcement learning aims to encourage the model to learn how to maximize rewards or minimize penalties, thus enhancing performance. Experiments on three datasets (FFMPeg+Qemu, Reveal, and Big-Vul) indicate that ProRLearn achieves performance improvement of 3.27–70.96% over state-of-the-art baselines in terms of F1 score. The combination of prompt tuning and reinforcement learning can offer a potential opportunity to improve performance in vulnerability detection. This means that it can effectively improve the performance in responding to constantly changing network environments and new threats. This interdisciplinary approach contributes to a better understanding of the interplay between natural language processing and reinforcement learning, opening up new opportunities and challenges for future research and applications.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140625680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OneLog: towards end-to-end software log anomaly detection","authors":"Shayan Hashemi, Mika Mäntylä","doi":"10.1007/s10515-024-00428-x","DOIUrl":"10.1007/s10515-024-00428-x","url":null,"abstract":"<div><p>With the growth of online services, IoT devices, and DevOps-oriented software development, software log anomaly detection is becoming increasingly important. Prior works mainly follow a traditional four-staged architecture (Preprocessor, Parser, Vectorizer, and Classifier). This paper proposes OneLog, which utilizes a single deep neural network instead of multiple separate components. OneLog harnesses convolutional neural network (CNN) at the character level to take digits, numbers, and punctuations, which were removed in prior works, into account alongside the main natural language text. We evaluate our approach in six message- and sequence-based data sets: HDFS, Hadoop, BGL, Thunderbird, Spirit, and Liberty. We experiment with Onelog with single-, multi-, and cross-project setups. Onelog offers state-of-the-art performance in our datasets. Onelog can utilize multi-project datasets simultaneously during training, which suggests our model can generalize between datasets. Multi-project training also improves Onelog performance making it ideal when limited training data is available for an individual project. We also found that cross-project anomaly detection is possible with a single project pair (Liberty and Spirit). Analysis of model internals shows that one log has multiple modes of detecting anomalies and that the model learns manually validated parsing rules for the log messages. We conclude that character-based CNNs are a promising approach toward end-to-end learning in log anomaly detection. They offer good performance and generalization over multiple datasets. We will make our scripts publicly available upon the acceptance of this paper.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 2","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00428-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140614302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated quantum software engineering","authors":"Aritra Sarkar","doi":"10.1007/s10515-024-00436-x","DOIUrl":"10.1007/s10515-024-00436-x","url":null,"abstract":"<div><p>As bigger quantum processors with hundreds of qubits become increasingly available, the potential for quantum computing to solve problems intractable for classical computers is becoming more tangible. Designing efficient quantum algorithms and software in tandem is key to achieving quantum advantage. Quantum software engineering is challenging due to the unique counterintuitive nature of quantum logic. Moreover, with larger quantum systems, traditional programming using quantum assembly language and qubit-level reasoning is becoming infeasible. Automated Quantum Software Engineering (AQSE) can help to reduce the barrier to entry, speed up development, reduce errors, and improve the efficiency of quantum software. This article elucidates the motivation to research AQSE (why), a precise description of such a framework (what), and reflections on components that are required for implementing it (how).</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-024-00436-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140598066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bug reports priority classification models. Replication study","authors":"Andreea Galbin-Nasui, Andreea Vescan","doi":"10.1007/s10515-024-00432-1","DOIUrl":"10.1007/s10515-024-00432-1","url":null,"abstract":"<div><p>Bug tracking systems receive a large number of bugs on a daily basis. The process of maintaining the integrity of the software and producing high-quality software is challenging. The bug-sorting process is usually a manual task that can lead to human errors and be time-consuming. The purpose of this research is twofold: first, to conduct a literature review on the bug report priority classification approaches, and second, to replicate existing approaches with various classifiers to extract new insights about the priority classification approaches. We used a Systematic Literature Review methodology to identify the most relevant existing approaches related to the bug report priority classification problem. Furthermore, we conducted a replication study on three classifiers: Naive Bayes (NB), Support Vector Machines (SVM), and Convolutional Neural Network (CNN). Two sets of experiments are performed: first, our own NLTK implementation based on NB and CNN, and second, based on Weka implementation for NB, SVM, and CNN. The dataset used consists of several Eclipse projects and one project related to database systems. The obtained results are better for the bug priority P3 for the CNN classifier, and overall the quality relation between the three classifiers is preserved as in the original studies. The replication study confirmed the findings of the original studies, emphasizing the need to further investigate the relationship between the characteristics of the projects used as training and those used as testing.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"31 1","pages":""},"PeriodicalIF":2.0,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140598070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}