Information and Software Technology最新文献

筛选
英文 中文
Data migration for column family database evolution 列族数据库演化的数据迁移
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-08 DOI: 10.1016/j.infsof.2025.107834
Pablo Suárez-Otero , Michael J. Mior , María José Suárez-Cabal , Javier Tuya
{"title":"Data migration for column family database evolution","authors":"Pablo Suárez-Otero ,&nbsp;Michael J. Mior ,&nbsp;María José Suárez-Cabal ,&nbsp;Javier Tuya","doi":"10.1016/j.infsof.2025.107834","DOIUrl":"10.1016/j.infsof.2025.107834","url":null,"abstract":"<div><h3>Context</h3><div>Database evolution involves processes such as the evolution of the schema, the adaptation of the application to the new schema, and migrations of data to the new or modified structures of the schema. Data migration is particularly crucial in databases where data repetition is common such as the NoSQL column family DBMSs. In these systems, data integrity cannot be enforced from the database side, but instead needs to be maintained from the application side. Database evolution is also affected by data repetition and the absence of data integrity enforcement from the database, as any evolution of the schema requires data migrations to maintain data integrity.</div></div><div><h3>Objectives</h3><div>Ensure data integrity in NoSQL column family DBMSs during database evolution by providing specific instructions for the execution of the necessary data migrations.</div></div><div><h3>Methods</h3><div>We propose MoDEvo, a model-driven engineering approach that provides a data migration model to ensure data integrity for database evolution in column-family DBMSs. This model is then transformed into an executable script that implements the migration procedures.</div></div><div><h3>Results</h3><div>We evaluate MoDEvo by executing data migrations in case studies obtained from open-source projects where the schema evolved. In this evaluation we use Apache Cassandra, the most popular column-family DBMS. Through this evaluation, we verify that the scripts generated from the data migration model effectively maintain data integrity within the database.</div></div><div><h3>Conclusion</h3><div>MoDEvo aids database evolution in column family DBMSs by avoiding the incurrence in the creation of inconsistencies and can also detect impossible migrations, thereby preventing errors. There is still room for improvement such as extending the supported databases to other paradigms where data repetition is common and addressing the evolution of the client applications alongside schema evolution.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107834"},"PeriodicalIF":3.8,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144680590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research artifacts in secondary studies: A systematic mapping in software engineering 二次研究中的研究工件:软件工程中的系统映射
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-07 DOI: 10.1016/j.infsof.2025.107830
Aleksi Huotala , Miikka Kuutila , Mika Mäntylä
{"title":"Research artifacts in secondary studies: A systematic mapping in software engineering","authors":"Aleksi Huotala ,&nbsp;Miikka Kuutila ,&nbsp;Mika Mäntylä","doi":"10.1016/j.infsof.2025.107830","DOIUrl":"10.1016/j.infsof.2025.107830","url":null,"abstract":"<div><h3>Context:</h3><div>Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE).</div></div><div><h3>Objective:</h3><div>Our objective is to evaluate how SRs report research artifacts and to provide a comprehensive list of these artifacts.</div></div><div><h3>Method:</h3><div>We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts.</div></div><div><h3>Results:</h3><div>Our findings indicate that only 31.5% of the reviewed studies include research artifacts. Encouragingly, the situation is gradually improving, as our regression analysis shows a significant increase in the availability of research artifacts over time. However, in 2023, just 62.0% of secondary studies provide a research artifact while an even lower percentage, 30.4% use a permanent repository with a digital object identifier (DOI) for storage.</div></div><div><h3>Conclusion:</h3><div>To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of research artifacts in secondary studies.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107830"},"PeriodicalIF":3.8,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144588805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practitioners’ perceptions on requirements smells 从业者对需求气味的感知
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-05 DOI: 10.1016/j.infsof.2025.107823
Emanuele Gentili , Davide Falessi
{"title":"Practitioners’ perceptions on requirements smells","authors":"Emanuele Gentili ,&nbsp;Davide Falessi","doi":"10.1016/j.infsof.2025.107823","DOIUrl":"10.1016/j.infsof.2025.107823","url":null,"abstract":"<div><h3>Context:</h3><div>Software specifications are usually written in natural language and may suffer from imprecision, ambiguity, and other quality issues, hereafter referred to as requirement smells. Requirement smells can hinder project development in many aspects, such as delays, reworks, and low customer satisfaction. From an industrial perspective, we want to focus our time and effort on identifying and preventing the requirement smells of high interest. We also want to identify the metrics to measure the effect of smells on a software project.</div></div><div><h3>Objective:</h3><div>We aim to characterize types of requirement smells in terms of frequency, severity, and effects. To the best of our knowledge, no previous study analysed how frequency, severity, or effects vary across types of smells.</div></div><div><h3>Methods:</h3><div>We interview ten experienced practitioners from different divisions of a large international company in the safety–critical domain called MBDA Italy Spa. Then we survey 58 people from the same company to support our findings and extend the analysis to metrics for measuring specific types of requirements smells effects.</div></div><div><h3>Results:</h3><div>Our results show that the smell types perceived as most severe are Ambiguity and Unverifiability, while the most frequent are Ambiguity and Incompleteness. We also provide six Findings about requirements smells, such as that the effects of smells are expected to differ across smell types and stages of the project. our study suggests that measuring the effects of requirement smells may necessitate type-specific metrics.</div></div><div><h3>Conclusion:</h3><div>Our results contribute to a greater understanding of the importance of addressing requirement smells and provide actionable insights for improving requirement quality in industrial settings. Our results pave the way for future empirical investigations, such as mining project repositories, to measure the specific effect type and size of specific requirements’ smells.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107823"},"PeriodicalIF":3.8,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144588804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RAIDAD: A model-driven framework for automated and agile development of IoT data analysis software RAIDAD:一个模型驱动的框架,用于物联网数据分析软件的自动化和敏捷开发
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-05 DOI: 10.1016/j.infsof.2025.107818
Mohsen Gholami, Bahman Zamani, Behrouz Shahgholi Ghahfarokhi
{"title":"RAIDAD: A model-driven framework for automated and agile development of IoT data analysis software","authors":"Mohsen Gholami,&nbsp;Bahman Zamani,&nbsp;Behrouz Shahgholi Ghahfarokhi","doi":"10.1016/j.infsof.2025.107818","DOIUrl":"10.1016/j.infsof.2025.107818","url":null,"abstract":"<div><h3>Context:</h3><div>Nowadays, developing data analysis software for the IoT domain faces challenges such as complexity, repetitive tasks, and developers’ lack of domain knowledge. To address these issues, methodologies like CRISP-DM have been introduced, providing structured guidance for data analysis.</div></div><div><h3>Objectives:</h3><div>Despite the availability of structured methodologies, building data analysis pipelines still involves managing complexity and redundancy. Model-driven approaches have been proposed to tackle these challenges but often fail to address all stages of the data analysis workflow and the interdependencies between stages and datasets comprehensively. This research introduces RAIDAD, a model-driven framework that addresses these gaps by covering all phases of the CRISP-DM methodology.</div></div><div><h3>Methods:</h3><div>RAIDAD includes a domain-specific modeling language for IoT data analysis, a graphical modeling editor, a code generation transformation engine, and a data model assistant for seamless model-data integration. These components are delivered as an Eclipse plugin.</div></div><div><h3>Results:</h3><div>The evaluation of RAIDAD is two-fold. First, a comparative operational evaluation with RapidMiner and ML-Quadrat shows RAIDAD achieves a 9.6% improvement in usability and productivity over RapidMiner and a 23% improvement over ML-Quadrat. Second, RAIDAD is compared to a general-purpose programming language, demonstrating its superiority in reducing effort and production time for IoT data analysis software.</div></div><div><h3>Conclusion:</h3><div>This comprehensive framework ensures an efficient and organized approach to data analysis, addressing key challenges in the IoT domain. Future research will focus on expanding RAIDAD’s support for a wider range of data analysis and machine learning algorithms, enhancing automation capabilities, and incorporating continuous user feedback to ensure the framework evolves in line with emerging needs.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107818"},"PeriodicalIF":3.8,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An adaptive model for cross-domain code search 跨域代码搜索的自适应模型
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-02 DOI: 10.1016/j.infsof.2025.107827
Mengge Fang, Lie Wang, Haize Hu
{"title":"An adaptive model for cross-domain code search","authors":"Mengge Fang,&nbsp;Lie Wang,&nbsp;Haize Hu","doi":"10.1016/j.infsof.2025.107827","DOIUrl":"10.1016/j.infsof.2025.107827","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Context:&lt;/h3&gt;&lt;div&gt;Research on code search is one of the important research directions in the field of computer science. As software scales continue to grow and complexity increases, developers need to frequently search for and understand existing code in their daily work. Code search research aims to enhance the efficiency and accuracy of code search, including aspects such as natural language-based code search, code similarity comparison, code recommendation systems, and more. By delving into code search technologies, developers can more swiftly locate and comprehend the code they need, thereby boosting the efficiency and quality of software development.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Objective:&lt;/h3&gt;&lt;div&gt;However, the reliance of deep learning-based code search models on large datasets and the substantial time needed to acquire model parameters can impose substantial economic costs. Furthermore, such models have certain limitations in their adaptability and perform sub-optimally when applied to a new dataset (i.e., Cross-Domain code search).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Methods:&lt;/h3&gt;&lt;div&gt;To address these issues, we propose an Adaptive Cross-Domain code search model based on Self-Attention (ACD-SA), which is the first attempt to introduce a self-attention model into cross-domain code search. First, the fastText word embedding tool is employed to obtain the initial vector. Second, self-attention is utilized to effectively characterize the internal structure information of the initial vector to obtain the feature vector and model parameters. Next, a word matching matrix is constructed from the feature vectors to generate the initial grammatical information vector. Subsequently, a long-short term memory network (LSTM) is utilized to train the initial grammatical information vector and extract grammatical patterns. Finally, cross-domain code search analysis is performed by combining domain-specific word matching matrices and grammar patterns.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results:&lt;/h3&gt;&lt;div&gt;To verify the effectiveness of ACD-SA in cross-domain code search studies, an experimental comparative analysis is conducted on a training dataset and a target dataset. In comparison to existing baseline models, such as CodeHow, DeepCS, BAVE, and AdaCS, the experimental results demonstrate that ACD-SA yields superior results for Hit@2, Hit@3, Hit@5, Hit@10, and MRR.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion:&lt;/h3&gt;&lt;div&gt;By analyzing the defects and shortcomings of existing methods in cross-domain code search, the article proposes an ACD-SA cross-domain code search model.ACD-SA only needs to be trained on large datasets and the model is applied to code search applications on domain-specific datasets. On the one hand, ACD-SA solves the problem that traditional code search needs to spend a lot of time on the collection or crawling of large datasets and the training of model parameters in each search task. On the other hand, ACD-SA makes up for the singularity of the existing code search model for datase","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107827"},"PeriodicalIF":3.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vulnerability detection with Graph Attention Network and Metric Learning 基于图注意网络和度量学习的漏洞检测
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-07-02 DOI: 10.1016/j.infsof.2025.107826
Chunyong Zhang , Liangwei Yao , Yang Xin
{"title":"Vulnerability detection with Graph Attention Network and Metric Learning","authors":"Chunyong Zhang ,&nbsp;Liangwei Yao ,&nbsp;Yang Xin","doi":"10.1016/j.infsof.2025.107826","DOIUrl":"10.1016/j.infsof.2025.107826","url":null,"abstract":"<div><h3>Context:</h3><div>Static code vulnerability detection is a critical topic in software security. Researchers are interested in employing deep learning to discover vulnerabilities automatically. However, existing software analysis methods have a high rate of false positives and false negatives.</div></div><div><h3>Objective:</h3><div>High false negatives and high false positives may be caused by the problem of insufficient extraction of syntax and semantics, data imbalance, and overlapping feature distributions. Based on the above problems, we construct a vulnerability detection model GSM, which is a loosely coupled method based on the combination of <strong><u>G</u></strong>raph Attention Network, <strong><u>S</u></strong>ampling, and <strong><u>M</u></strong>etric Learning.</div></div><div><h3>Method:</h3><div>Firstly, we utilize the code property graph to represent source code and use graph attention networks for graph embedding learning. Secondly, we adopt a combination of oversampling and undersampling to deal with imbalanced dataset. Finally, we adopt a Metric Learning method based on the quadruple loss function to separate vulnerable and neutral samples.</div></div><div><h3>Results:</h3><div>Compared to the state-of-the-art method Reveal on the imbalanced dataset chrdeb, the performance of Precision, Recall, and F1-Score are improved by about 11.5%, 12.4%, and 12.7%, respectively.</div></div><div><h3>Conclusion:</h3><div>Under different datasets, GSM has shown better performance than state-of-the-art vulnerability detection methods in multiple metrics. GSM can resolve the problem of data imbalance and the inability to separate the two types of samples.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107826"},"PeriodicalIF":3.8,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144548332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards accessible website design through artificial intelligence: A systematic literature review 通过人工智能实现无障碍网站设计:系统的文献综述
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-06-27 DOI: 10.1016/j.infsof.2025.107821
Guillermo Vera-Amaro, José Rafael Rojano-Cáceres
{"title":"Towards accessible website design through artificial intelligence: A systematic literature review","authors":"Guillermo Vera-Amaro,&nbsp;José Rafael Rojano-Cáceres","doi":"10.1016/j.infsof.2025.107821","DOIUrl":"10.1016/j.infsof.2025.107821","url":null,"abstract":"<div><h3>Context:</h3><div>Web accessibility ensures that individuals with disabilities can access, navigate, and interact with online content effectively. Despite the availability of the Web Content Accessibility Guidelines (WCAG), significant barriers persist, largely due to the complexity of their implementation. Artificial intelligence (AI), particularly machine learning models, has emerged as a promising avenue to address these challenges, offering solutions for evaluation, correction, and content generation.</div></div><div><h3>Objective:</h3><div>This study aims to systematically review the intersection of web accessibility and AI by evaluating how AI-based methods enhance compliance with accessibility standards over the period 2019–2025, assessing their efficacy and alignment with WCAG principles.</div></div><div><h3>Methods:</h3><div>A systematic literature review (SLR) was conducted in three phases: planning, execution, and reporting. Research questions were formulated guiding the selection of search terms and strategies. A systematic search process was implemented, complemented by a snowballing technique to ensure comprehensive coverage of relevant studies. The quality of selected studies was rigorously assessed using predefined criteria, and data extraction was carried out following established best practices. The analysis combined narrative synthesis for qualitative insights and bibliometric techniques for quantitative evaluation.</div></div><div><h3>Results:</h3><div>From 277 studies, 31 were identified as relevant, highlighting AI’s primary contributions to generating alternative text for images, automating compliance assessments, providing correction suggestions, and designing alternative interfaces to enhance accessibility. Advances in large language models (LLMs) further demonstrate their potential for facilitating the creation of accessible content.</div></div><div><h3>Conclusions:</h3><div>AI presents significant potential to improve web accessibility by streamlining evaluation processes and supporting the creation of accessible content. However, further research is needed to address limitations such as inconsistent compliance and the lack of focus on non-visual disabilities. These findings underline the importance of leveraging AI to facilitate inclusive web design practices while ensuring adherence to accessibility standards.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107821"},"PeriodicalIF":3.8,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144522305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HeSQLNet: A Heterogeneous graph neural network for SQL-to-Text generation 用于SQL-to-Text生成的异构图神经网络
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-06-27 DOI: 10.1016/j.infsof.2025.107820
Junsan Zhang , Ao Lu , Junxiao Han , Yang Zhu , Yudie Yan , Juncai Guo , Yao Wan
{"title":"HeSQLNet: A Heterogeneous graph neural network for SQL-to-Text generation","authors":"Junsan Zhang ,&nbsp;Ao Lu ,&nbsp;Junxiao Han ,&nbsp;Yang Zhu ,&nbsp;Yudie Yan ,&nbsp;Juncai Guo ,&nbsp;Yao Wan","doi":"10.1016/j.infsof.2025.107820","DOIUrl":"10.1016/j.infsof.2025.107820","url":null,"abstract":"<div><h3>Context:</h3><div>Understanding the semantics of SQL queries is crucial for maintaining code and reusing functionalities in database access and management. However, SQL queries often remain challenging to comprehend, even for expert users. In this work, we address this challenge by focusing on SQL-to-Text, a task that translates SQL queries into corresponding natural language questions. Existing approaches predominantly encode SQL queries using their Abstract Syntax Tree (AST) representation and then decode this structure into textual explanations. However, these methods often treat the AST as a homogeneous graph, overlooking the diverse relationships between its nodes, such as parent–child and sibling relationships.</div></div><div><h3>Objective:</h3><div>To address this issue, this paper introduces HeSQLNet: a Heterogeneous Graph Neural Network for SQL-to-Text Generation.</div></div><div><h3>Methods:</h3><div>Specifically, we first propose a Heterogeneous Feature Graph (HFG), which augments the AST with six distinct edge types to better capture the heterogeneous relationships inherent in SQL queries. We further develop a heterogeneous graph neural network with attention, leveraging a two-stage aggregation process to effectively extract and encode these heterogeneous features within the HFG. The enriched HFG representation is then incorporated into an encoder–decoder framework, called HeSQLNet, to generate natural language descriptions of SQL queries. To assess the ability of SQL-to-Text models to handle complex queries and demonstrate compositional generalization, we introduce SpiderComGen, a new compositional generalization dataset derived from the Spider dataset.</div></div><div><h3>Results:</h3><div>We conduct extensive experiments on both the widely-used and our proposed datasets. The experimental results reveal that HeSQLNet significantly outperforms existing state-of-the-art approaches in both effectiveness and generalization capability. Additionally, compared to the recent large language models, human evaluations and case studies show that HeSQLNet delivers not only accurate results but also more concise outputs.</div></div><div><h3>Conclusion:</h3><div>Our HeSQLNet proves that heterogeneous feature fusion and extraction significantly improve SQL-to-Text generation.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107820"},"PeriodicalIF":3.8,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144522794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiffBCE: Difference contrastive learning for binary code embeddings 二进制代码嵌入的差异对比学习
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-06-26 DOI: 10.1016/j.infsof.2025.107822
Yun Zhang , Ge Cheng
{"title":"DiffBCE: Difference contrastive learning for binary code embeddings","authors":"Yun Zhang ,&nbsp;Ge Cheng","doi":"10.1016/j.infsof.2025.107822","DOIUrl":"10.1016/j.infsof.2025.107822","url":null,"abstract":"<div><h3>Context:</h3><div>Binary code embedding plays a crucial role in binary similarity detection and software security analysis. However, conventional methods often suffer from scalability issues and depend heavily on large amounts of labeled data, limiting their practical deployment in real-world scenarios.</div></div><div><h3>Objectives:</h3><div>This research introduces DiffBCE, a novel binary code embedding method based on differential contrastive learning. The primary goal is to overcome the limitations of existing approaches by reducing the reliance on labeled data while enhancing the robustness and semantic sensitivity of binary code representations.</div></div><div><h3>Methods:</h3><div>DiffBCE integrates two complementary data augmentation strategies – insensitive transformations (implemented via dropout) and sensitive transformations (using instruction replacement with a Masked Language Model) – within a contrastive learning framework. In addition, a conditional difference prediction module is introduced to capture subtle semantic changes by identifying differences between original and transformed binary code. The model is jointly trained with a combined loss function balancing contrastive loss and conditional difference prediction loss. Experimental validation is performed on multiple binary datasets across various scenarios, including cross-version analysis, cross-optimization-level evaluation, and code obfuscation difference analysis.</div></div><div><h3>Results:</h3><div>Experimental evaluations demonstrate that DiffBCE significantly outperforms state of-the-art methods (e.g., Asm2Vec, DeepBinDiff, PalmTree). Across three similarity detection scenarios, the method achieves improvements in F1 scores by approximately 3.8%, 5.6%, and 11.1%, respectively, underscoring its robustness and effectiveness in handling complex binary code differences.</div></div><div><h3>Conclusions:</h3><div>DiffBCE offers a scalable and efficient solution for binary code embedding by effectively capturing rich semantic features without requiring extensive labeled data. Its superior performance in various testing scenarios suggests promising applications in vulnerability detection, code reuse analysis, reverse engineering, and automated patch generation, paving the way for enhanced software security assessments.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107822"},"PeriodicalIF":3.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144633619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrating energy consumption in the development of serverless applications 在无服务器应用程序的开发中集成能耗
IF 3.8 2区 计算机科学
Information and Software Technology Pub Date : 2025-06-26 DOI: 10.1016/j.infsof.2025.107819
Pablo Serrano-Gutierrez, Inmaculada Ayala, Lidia Fuentes
{"title":"Integrating energy consumption in the development of serverless applications","authors":"Pablo Serrano-Gutierrez,&nbsp;Inmaculada Ayala,&nbsp;Lidia Fuentes","doi":"10.1016/j.infsof.2025.107819","DOIUrl":"10.1016/j.infsof.2025.107819","url":null,"abstract":"<div><h3>Context:</h3><div>The increasing environmental impact of Information and Communication Technologies (ICTs), particularly the energy consumption associated with serverless applications, necessitates the development of methodologies to optimize energy efficiency. This study addresses the need for energy-aware design and runtime adaptation in serverless architectures.</div></div><div><h3>Objective:</h3><div>To develop and validate a methodology that integrates energy monitoring into the development and runtime management of serverless applications, thereby enabling significant reductions in energy consumption while maintaining functionality.</div></div><div><h3>Methods:</h3><div>A new version of FUSPAQ, a framework for the optimization of serverless applications, was developed. This version incorporates tools like Kepler for real-time energy monitoring and employs an energy-aware orchestration mechanism to dynamically select energy-efficient function configurations. Validation was conducted through a facial recognition case study and benchmark experiments, comparing energy consumption across different scenarios with and without the proposed adaptations.</div></div><div><h3>Results:</h3><div>The enhanced FUSPAQ framework successfully integrated energy consumption metrics into the decision-making process for function selection and runtime adaptation. Benchmark tests confirmed the scalability of the solution, with energy-efficient outcomes even in complex applications.</div></div><div><h3>Conclusion:</h3><div>The study highlights the potential of integrating energy-aware practices in serverless applications, presenting a scalable and practical approach to reducing their environmental footprint. By leveraging tools like Kepler and frameworks like FUSPAQ, developers can achieve significant energy savings without compromising application performance. This work contributes to the advancement of Green Software Engineering by emphasizing runtime energy adaptation in Function-as-a-Service (FaaS) architectures.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"186 ","pages":"Article 107819"},"PeriodicalIF":3.8,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144564048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信