Katiane Oliveira Alpes da Silva , Ricardo Massa Ferreira Lima , Vanderson Botelho da Silva
{"title":"Process mining for agile software process assessment and improvement","authors":"Katiane Oliveira Alpes da Silva , Ricardo Massa Ferreira Lima , Vanderson Botelho da Silva","doi":"10.1016/j.infsof.2025.107680","DOIUrl":"10.1016/j.infsof.2025.107680","url":null,"abstract":"<div><h3>Context:</h3><div>Agile software processes, designed for flexibility and continuous improvement, pose challenges in extracting actionable insights from event logs due to their inherent unstructured nature.</div></div><div><h3>Objective:</h3><div>The study evaluates whether existing process mining techniques can effectively uncover reliable and insightful information on software development processes adopting agile methodologies.</div></div><div><h3>Method:</h3><div>The work uses various algorithms to analyze procedural flows and business rules within an event log containing data from 3,418 agile software development projects at a company with over 1,500 employees. By categorizing processes according to project size, our analysis aimed to determine the kind of insights these algorithms could reveal. We specifically focused on algorithms that produced high-quality insights for a deeper examination of aspects like effort rate, frequency of activities, and relationships between activities. Subsequently, technical and managerial staff reviewed the results to assess the quality and relevance of the insights generated. Validation involved a semi-structured interview with managers and technicians to ensure the relevance and applicability of the findings.</div></div><div><h3>Results:</h3><div>The analysis demonstrates the efficacy of declarative business process techniques in extracting actionable insights from agile development teams’ data. Such techniques accurately capture the daily routines and documented processes of the teams. High-performing teams typically followed fewer rules, had less job rotation, involved fewer individuals, and engaged in a more limited range of activities. Domain experts and team managers found these insights to be coherent and potentially valuable for enhancing the performance of software development processes.</div></div><div><h3>Conclusions:</h3><div>Declarative modeling is particularly adept at revealing the patterns of flexible software development workflows, presenting initial support for teams, managers, and decision-makers through both descriptive and prescriptive analysis.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107680"},"PeriodicalIF":3.8,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Axel Martin , Djamel Eddine Khelladi , Théo Matricon , Mathieu Acher
{"title":"Re-evaluating metamorphic testing of chess engines: A replication study","authors":"Axel Martin , Djamel Eddine Khelladi , Théo Matricon , Mathieu Acher","doi":"10.1016/j.infsof.2025.107679","DOIUrl":"10.1016/j.infsof.2025.107679","url":null,"abstract":"<div><h3>Context:</h3><div>This study aims to confirm, replicate and extend the findings of a previous article entitled <em>”Metamorphic Testing of Chess Engines”</em> that reported inconsistencies in the analyses provided by <em>Stockfish</em>, the most widely used chess engine, for transformed chess positions that are fundamentally identical. Initial findings, under conditions strictly identical to those of the original study, corroborate the reported inconsistencies.</div></div><div><h3>Objective:</h3><div>However, the original article considers a specific dataset (including randomly generated chess positions, end-games, or checkmate problems) and very low analysis depth (10 plies,<span><span><sup>1</sup></span></span> corresponding to 5 moves). These decisions pose threats that limit generalizability of the results, but also their practical usefulness both for chess players and maintainers of Stockfish. Thus, we replicate the original study.</div></div><div><h3>Methods:</h3><div>We consider this time (1) positions derived from actual chess games, (2) analyses at appropriate and larger depths, and (3) different versions of Stockfish. We conduct novel experiments on thousands of positions, employing significantly deeper searches.</div></div><div><h3>Results:</h3><div>The replication results show that the Stockfish chess engines demonstrate significantly greater consistency in its evaluations. The metamorphic relations are not as effective as in the original article, especially on realistic chess positions. We also demonstrate that, for any given position, there exists a depth threshold beyond which further increases in depth do not result in any evaluation differences for the studied metamorphic relations. We perform an in-depth analysis to identify and clarify the implementation reasons behind Stockfish’s inconsistencies when dealing with transformed positions.</div></div><div><h3>Conclusion:</h3><div>A first concrete result is thus that metamorphic testing of chess engines is not yet an effective technique for finding faults of Stockfish. Another result is the lessons learned through this replication effort: metamorphic relations must be verified in the context of the domain’s specificities; without such contextual validation, they may lead to misleading or irrelevant conclusions; changes in parameters and input dataset can drastically alter the effectiveness of a testing method.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107679"},"PeriodicalIF":3.8,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Rakib Hossain Misu , Jiawei Li , Adithya Bhattiprolu , Yang Liu , Eduardo Santana de Almeida , Iftekhar Ahmed
{"title":"Test smell: A parasitic energy consumer in software testing","authors":"Md Rakib Hossain Misu , Jiawei Li , Adithya Bhattiprolu , Yang Liu , Eduardo Santana de Almeida , Iftekhar Ahmed","doi":"10.1016/j.infsof.2025.107671","DOIUrl":"10.1016/j.infsof.2025.107671","url":null,"abstract":"<div><h3>Context:</h3><div>Traditionally, energy efficiency research has focused on reducing energy consumption at the hardware level and, more recently, in the design and coding phases of the software development life cycle. However, software testing’s impact on energy consumption did not receive attention from the research community. Specifically, how test code design quality and test smell (e.g., sub-optimal design and bad practices in test code) impact energy consumption has not been investigated yet.</div></div><div><h3>Objective:</h3><div>This study aims to examine open-source software projects to analyze the association between test smell and its effects on energy consumption in software testing.</div></div><div><h3>Methods:</h3><div>We conducted a mixed-method empirical analysis from two perspectives; software (data mining in 12 Apache projects) and developers’ views (a survey of 62 software practitioners).</div></div><div><h3>Results:</h3><div>Our findings show that: (1) test smell is associated with energy consumption in software testing. Specifically, the smelly part of a test case consumes more energy compared to the non-smelly part. (2) certain test smells are more energy-hungry than others, (3) refactored test cases tend to consume less energy than their smelly counterparts, and (4) most developers (45<span><math><mtext>%</mtext></math></span> of the survey respondents) lack knowledge about test smells’ impact on energy consumption.</div></div><div><h3>Conclusion:</h3><div>Based on the results, we emphasize raising developers awareness regarding the impact of test smells on energy consumption. Additionally we present several observations that can direct future research and developments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107671"},"PeriodicalIF":3.8,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143301051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Why and how do organizations create user-led open source consortia? A systematic literature review","authors":"Elçin Yenişen Yenişen Yavuz, Dirk Riehle","doi":"10.1016/j.infsof.2025.107681","DOIUrl":"10.1016/j.infsof.2025.107681","url":null,"abstract":"<div><h3>Context</h3><div>User-led open source (OS) consortia (foundations) consist of organizations from industries beyond the software industry collaborating to create open-source software solutions for their internal processes. Initially pioneered by higher education organizations in the 2000s, this concept has gained traction in recent years across various industries.</div></div><div><h3>Objective</h3><div>This study has two research objectives. The first objective is to provide an overview of the current state of the art in this field by identifying previously studied topics and gathering examples from different industries. The second objective is to understand the structure of user-led OS consortia and the motivations of organizations for participating in such consortia.</div></div><div><h3>Method</h3><div>To gain a comprehensive understanding of this phenomenon, we conducted a systematic literature review, covering the years 2000 to 2023. Furthermore, we performed thematic analysis on 43 selected studies to identify and examine the key characteristics, ecosystems, and the benefits organizations gain from involvement in user-led OS consortia.</div></div><div><h3>Results</h3><div>We identified 43 unique papers on user-led OS consortia and provided details on 14 sample user-led OS consortia projects. We defined 19 characteristics of user-led OS consortia and 16 benefits for organizations’ involvement. Additionally, we outlined the key actors and their roles in user-led OS consortia.</div></div><div><h3>Conclusion</h3><div>We provided an overview of the current state of the art in this field. We identified the structure of user-led OS consortia and the organizations’ motivations for participating in such consortia.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107681"},"PeriodicalIF":3.8,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143418924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A more accurate bug localization technique for bugs with multiple buggy code files","authors":"Hui Xu , Zhaodan Wang , Weiqin Zou","doi":"10.1016/j.infsof.2025.107675","DOIUrl":"10.1016/j.infsof.2025.107675","url":null,"abstract":"<div><h3>Context:</h3><div>Bug localization is a key step in bug fixing. Despite considerable progress, existing bug localization techniques still perform unsatisfactorily in situations where the complete fix to a bug involves touching multiple buggy code files. That is, for such bugs, those techniques tend to locate correctly only one or at least not all buggy code files, leaving other buggy code files undetected.</div></div><div><h3>Objective:</h3><div>This study aims to improve bug localization in cases where resolving a bug requires modifications to multiple buggy code files by proposing HitMore to rank more truly buggy files higher in the recommendation list.</div></div><div><h3>Method:</h3><div>The basic idea of HitMore is to attempt to retrieve a subset of truly buggy code files first, then use these files to retrieve other buggy code files based on code relation analysis. For the first part, we designed three kinds of domain-specific features to build a machine-learning model to identify the truly buggy code file subset. For the second part, we make use of three types of code relations between the code base and the buggy file subset to better retrieve the remaining truly buggy code files.</div></div><div><h3>Results:</h3><div>The experiments on six widely open-source projects show that: Our technique is effective in identifying the subset of truly buggy code files, with a weighted prediction F1-Score of 86.1%–92.1%. By leveraging the code relations to the retrieved subset and the code base, our HitMore could retrieve all truly buggy code files for 29.31%–69.56% of bugs across six projects. For multiple-buggy-code-file bugs, HitMore could completely localize such bugs by up to 15.38%, 19.36%, and 11.86% more than three representative IRBL baselines across six projects.</div></div><div><h3>Conclusion:</h3><div>The experimental results demonstrate the potential of HitMore in reducing developers’ burden of locating and further fixing relatively complex bugs such as those with multiple buggy code files in practice.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107675"},"PeriodicalIF":3.8,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143301052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Sovrano, Sandro Vonlanthen, Alberto Bacchelli
{"title":"Beyond the lab: An in-depth analysis of real-world practices in government-to-citizen software user documentation","authors":"Francesco Sovrano, Sandro Vonlanthen, Alberto Bacchelli","doi":"10.1016/j.infsof.2025.107676","DOIUrl":"10.1016/j.infsof.2025.107676","url":null,"abstract":"<div><h3>Context:</h3><div>Governments, including Switzerland through its <em>Digital Switzerland Strategy</em>, are using new technologies to improve public services. However, unclear user guides often lead people to prefer expensive help desk services. Current research on software documentation is limited by small-scale surveys that do not reflect real-world challenges. This paper addresses these gaps by examining the limitations of user guides in a more practical context.</div></div><div><h3>Objective:</h3><div>Building on the identified need for a more comprehensive understanding of user documentation in real-world applications, this study aims to critically analyse user documentation in government-to-citizen (G2C) interactions within Switzerland. We intend to identify both common and critical issues in existing documentation to direct future research towards substantial improvements. By doing so, this research will contribute to the development of more effective user guides, ultimately improving the digital experience for citizens and reducing reliance on costly help desk support.</div></div><div><h3>Methods:</h3><div>Our research methodology involved a thorough analysis of user documentation in German-speaking Swiss cantons. We began with around 5’000 links from official cantonal websites and narrowed it down to nearly 600 user guides relevant to G2C applications. The study progressed in phases: we first assessed the content to identify real-world documentation characteristics, then compared these with common issues from academic research to pinpoint frequent problems. Finally, we analysed the data to identify overarching trends in the documentation characteristics and issues.</div></div><div><h3>Results:</h3><div>Our analyses, which linked guide features to documentation issues, uncovered prevalent real-world issue trends, characterized by significant statistical correlations (<span><math><mrow><mi>p</mi><mo><</mo><mo>.</mo><mn>05</mn></mrow></math></span>) with the socioeconomic status of the cantons, such as their wealth and population size.</div></div><div><h3>Conclusions:</h3><div>Identifying these trends will help researchers and practitioners concentrate on the most common and critical issues encountered in practice. This, in turn, holds the potential to drive the development of more effective technology for documenting software. <strong>Data and Materials:</strong> <span><span>https://doi.org/10.5281/zenodo.10592871</span><svg><path></path></svg></span></div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"181 ","pages":"Article 107676"},"PeriodicalIF":3.8,"publicationDate":"2025-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143301050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rongcun Wang , Yiqian Hou , Yuan Tian , Zhanqi Cui , Shujuan Jiang
{"title":"XL-HQL: A HQL query generation method via XLNet and column attention","authors":"Rongcun Wang , Yiqian Hou , Yuan Tian , Zhanqi Cui , Shujuan Jiang","doi":"10.1016/j.infsof.2025.107674","DOIUrl":"10.1016/j.infsof.2025.107674","url":null,"abstract":"<div><h3>Context:</h3><div>Object-relational mapping (ORM) tools, like Hibernate, are widely used to facilitate the development of database applications by bridging the gap between object-oriented programming (OOP) and relational database management systems (DBMS). These ORM tools simplify the process of mapping OOP objects to relational tables, addressing issues of data inconsistency and performance. However, they also introduce the need to write queries in specific languages, such as Hibernate Query Language (HQL), to manage data interactions within the database.</div></div><div><h3>Objective:</h3><div>These query languages can be difficult to write and error-prone due to the complexities of accurately mapping object models to relational schema with intricate relationships and inheritance hierarchies. To mitigate this issue, a recent study introduced the task of automated HQL query generation, i.e., automatically generating HQL from program context (target method’s signature, properties, and optional method comments and call context). However, the existing solution, HQLgen, has shown limited performance, with an accuracy of 34.52%.</div></div><div><h3>Method:</h3><div>In this paper, we propose a novel HQL query generation approach named XL-HQL. XL-HQL aims to address two main challenges in HQL query generation: limited context information and large search space. Specifically, XL-HQL contains a pre-trained model-based encoder, rules defined to reduce search space, and a column-attention-enabled decoder, which is shown to be effective in SQL generation approaches.</div></div><div><h3>Result:</h3><div>To evaluate the effectiveness of XL-HQL, we designed and conducted experiments on an existing HQL query generation benchmark, which contains 24,118 HQL queries extracted from 3,481 open-source projects. The experimental results show that our approach achieves 66.93% and 64.47% accuracy on mixed and cross-project datasets, respectively, nearly doubling the performance of the state-of-the-art (SOTA) baseline.</div></div><div><h3>Conclusions:</h3><div>The application of pre-trained models that are suitable for handling long sequences for the HQL query generation task shows great potential. Moreover, the defined rules based on OOP knowledge are effective for reducing search space and improving the performance of the task.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"180 ","pages":"Article 107674"},"PeriodicalIF":3.8,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143304415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paulo Malcher , Davi Viana , Pablo Oliveira Antonino , Rodrigo Pereira dos Santos
{"title":"Towards an understanding of requirements management in software ecosystems","authors":"Paulo Malcher , Davi Viana , Pablo Oliveira Antonino , Rodrigo Pereira dos Santos","doi":"10.1016/j.infsof.2025.107672","DOIUrl":"10.1016/j.infsof.2025.107672","url":null,"abstract":"<div><h3>Context:</h3><div>Software ecosystems (SECO) have introduced complexity in requirements management due to multiple actors’ collaboration through several organizational boundaries.</div></div><div><h3>Objective:</h3><div>The main contribution of this article is to improve the understanding of requirements management in SECO. We propose a conceptual model whose concepts, definitions, and relationships are grounded in the literature and the modern software industry’s practices.</div></div><div><h3>Methods:</h3><div>We applied Design Science to build the conceptual model and conducted a Delphi study with 22 experts to assess it. We performed two rounds and adjusted our model according to the experts’ judgment.</div></div><div><h3>Results:</h3><div>We reached a conceptual model comprising 43 concepts and their relationships that help to understand requirements management in SECO. Moreover, we provided a glossary with a definition of each concept. This conceptual model can help abstract the complexity of the requirements management in SECO.</div></div><div><h3>Conclusions:</h3><div>By organizing concepts and relationships in requirements management in SECO, this conceptual model makes it possible to expand the body of knowledge in the area and serves as a basis for new solutions to support requirements management in SECO.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"180 ","pages":"Article 107672"},"PeriodicalIF":3.8,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143305313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Alleviating class imbalance in Feature Envy prediction: An oversampling technique based on code entity attributes","authors":"Jiamin Guo , Yangyang Zhao , Tao Zheng , Zhifei Chen , Mingyue Jiang , Zuohua Ding","doi":"10.1016/j.infsof.2025.107673","DOIUrl":"10.1016/j.infsof.2025.107673","url":null,"abstract":"<div><h3>Context:</h3><div>Feature Envy is a common code smell that occurs when a method heavily relies on data or functionality from other classes. Detecting Feature Envy is essential for improving software modularity and reducing technical debt. However, real-world datasets often exhibit severe class imbalance, with far fewer Feature Envy instances than non-smelly ones, posing challenges for prediction models. Traditional oversampling techniques attempt to address this issue by relying solely on numerical vectors but often fail to capture the complex relationships between code entities, potentially deviating from the nature of Feature Envy.</div></div><div><h3>Objective:</h3><div>This study introduces STANDER, a novel oversampling technique based on code entity similarity, designed to handle class imbalance in Feature Envy prediction by generating synthetic samples that better reflect the characteristics of Feature Envy.</div></div><div><h3>Method:</h3><div>STANDER creates synthetic samples by leveraging multidimensional code entity similarity, incorporating attributes such as dependency relationships, historical changes and code text. It was evaluated on five datasets using five classifiers: Naive Bayes, Logistic Regression, Support Vector Machine, Random Forest, and Decision Tree. Its performance was compared to baseline over-sampling techniques based on precision, recall, F1-score, and Matthews Correlation Coefficient.</div></div><div><h3>Results:</h3><div>STANDER enhances dataset diversity while maintaining clear boundaries between minority and majority classes, as reflected by higher Nearest Neighbor Diversity and Silhouette Score values. Models balanced with STANDER exhibited significant improvements in predictive performance, particularly in recall, F1-score, and Matthews Correlation Coefficient. Compared to the other oversampling techniques, STANDER demonstrated advantages in handling imbalanced datasets, especially in the Logistic Regression and Decision Tree classifiers. Statistical results confirm significant performance improvements across most models, highlighting its effectiveness and applicability.</div></div><div><h3>Conclusion:</h3><div>STANDER is an effective solution to alleviate class imbalance problem in Feature Envy detection by generating more representative synthetic samples that improve prediction performance.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"180 ","pages":"Article 107673"},"PeriodicalIF":3.8,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143305100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhao Zhang , Senlin Luo , Yingdan Guan , Limin Pan
{"title":"MPCA: Constructing the APTs provenance graphs through multi-perspective confidence and association","authors":"Zhao Zhang , Senlin Luo , Yingdan Guan , Limin Pan","doi":"10.1016/j.infsof.2025.107670","DOIUrl":"10.1016/j.infsof.2025.107670","url":null,"abstract":"<div><div>The forensic analysis of Advanced Persistent Threats (APTs) attacks is crucial for maintaining cybersecurity. To address the challenges posed by the high complexity and strong concealment of APT attacks, provenance graph based on inter entity dependencies are used for forensic investigation. However, under long-term persistent attacks, entities with semantically consistent behavior patterns become excessively redundant, leading to an explosion of inter entity dependencies and a decrease in forensic efficiency. In addition, the implicit relationships within and between events are not fully represented, and alarm information spreads to neighboring benign events, making it difficult to accurately reconstruct attack scenario. In this paper, we propose an APT attack attribution method MPCA that combines multi-perspective confidence and association. Firstly, by merging parallel branches with semantically consistent behavior patterns in the process connected subgraph, redundant entities and their dependencies are reduced. Secondly, event confidence is estimated to exclude benign events, the association between events and alarms is analyzed to highlight attack events. Experimental results demonstrate that MPCA achieves state-of-the-art performance. MPCA can improve the efficiency of constructing attack scenario graphs, reduce false positive and false negative rates, and demonstrate greater adaptability in attack attribution tasks.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"180 ","pages":"Article 107670"},"PeriodicalIF":3.8,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143305099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}