He Huang , Nan Sun , Massimiliano Tani , Yu Zhang , Jiaojiao Jiang , Sanjay Jha
{"title":"Can LLM-generated misinformation be detected: A study on Cyber Threat Intelligence","authors":"He Huang , Nan Sun , Massimiliano Tani , Yu Zhang , Jiaojiao Jiang , Sanjay Jha","doi":"10.1016/j.future.2025.107877","DOIUrl":"10.1016/j.future.2025.107877","url":null,"abstract":"<div><div>Given the increasing number and severity of cyber attacks, there has been a surge in cybersecurity information across various mediums such as posts, news articles, reports, and other resources. Cyber Threat Intelligence (CTI) involves processing data from these cybersecurity sources, enabling professionals and organizations to gain valuable insights. However, with the rapid dissemination of cybersecurity information, the inclusion of fake CTI can lead to severe consequences, including data poisoning attacks. To address this challenge, we have implemented a three-step strategy: generating synthetic CTI, evaluating the quality of the generated CTI, and detecting fake CTI. Unlike other subdomains, such as fake COVID news detection, there is currently no publicly available dataset specifically tailored for fake CTI detection research. To address this gap, we first establish a reliable groundtruth dataset by utilizing domain-specific cybersecurity data to fine-tune a Large Language Model (LLM) for synthetic CTI generation. We then employ crowdsourcing techniques and advanced synthetic data verification methods to evaluate the quality of the generated dataset, introducing a novel evaluation methodology that combines quantitative and qualitative approaches. Our comprehensive evaluation reveals that the generated CTI cannot be distinguished from genuine CTI by human annotators, regardless of their computer science background, demonstrating the effectiveness of our generation approach. We benchmark various misinformation detection techniques against our groundtruth dataset to establish baseline performance metrics for identifying fake CTI. By leveraging existing techniques and adapting them to the context of fake CTI detection, we provide a foundation for future research in this critical field. To facilitate further research, we make our code, dataset, and experimental results publicly available on <span><span>GitHub</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"173 ","pages":"Article 107877"},"PeriodicalIF":6.2,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143941291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"X-DINC: Toward Cross-Layer ApproXimation for the Distributed and In-Network ACceleration of Multi-Kernel Applications","authors":"Zahra Ebrahimi , Maryam Eslami , Xun Xiao , Akash Kumar","doi":"10.1016/j.future.2025.107864","DOIUrl":"10.1016/j.future.2025.107864","url":null,"abstract":"<div><div>With the rapid evolution of programmable network devices and the urge for energy-efficient and sustainable computing, network infrastructures are mutating toward a computing pipeline, providing In-Network Computing (INC) capability. Despite the initial success in offloading single/small kernels to the network devices, deploying multi-kernel applications remains challenging due to limited memory, computing resources, and lack of support for Floating Point (FP) and complex operations. To tackle these challenges, we present a cross-layer approximation and distribution methodology (X-DINC) that exploits the error resilience of applications. X-DINC utilizes a chain of techniques to facilitate kernel deployment and distribution across heterogeneous devices in INC environments. First, we identify approximation and optimization opportunities in data acquisition and computation phases of multi-kernel applications. Second, we simplify complex arithmetic operations to cope with the <em>computation</em> limitations of the programmable network switches. Third, we perform application-level sensitivity analysis to measure the trade-off between performance gain and Quality of Results (QoR) loss when approximating individual kernels via various techniques. Finally, a greedy heuristic swiftly generates Pareto/near-Pareto mixed-precision configurations that maximize the performance gain while maintaining the user-defined QoR. X-DINC is prototyped on a Virtex-7 Field Programmable Gate Array (FPGA) and evaluated using the Blind Source Separation (BSS) application on industrial audio dataset. Results show that X-DINC performs separation up to 35% faster with up to 88% lower Area-Delay Product (ADP) compared to an <em>Accurate-Centralized</em> approach, when distributed across 2 to 7 network nodes, while maintaining audio quality within an acceptable range of 15–20 dB.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107864"},"PeriodicalIF":6.2,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A deep reinforcement learning based algorithm for time and cost optimized scaling of serverless applications","authors":"Anupama Mampage, Shanika Karunasekera, Rajkumar Buyya","doi":"10.1016/j.future.2025.107873","DOIUrl":"10.1016/j.future.2025.107873","url":null,"abstract":"<div><div>Serverless computing has gained a strong traction in the cloud computing community in recent years. Among the many benefits of this novel computing model, the rapid auto-scaling capability of user applications takes prominence. However, the offer of adhoc scaling of user deployments at function level introduces many complications to serverless systems. The added delay and failures in function request executions caused by the time consumed for dynamically creating new resources to suit function workloads, known as the cold-start delay, is one such very prevalent shortcoming. Maintaining idle resource pools to alleviate this issue often results in wasted resources from the cloud provider perspective. Existing solutions to address this limitation mostly focus on predicting and understanding function load levels in order to proactively create required resources. Although these solutions improve function performance, the lack of understanding on the overall system characteristics in making these scaling decisions often leads to the sub-optimal usage of system resources. Further, the multi-tenant nature of serverless systems requires a scalable solution adaptable for multiple co-existing applications, a limitation seen in most current solutions. In this paper, we introduce a novel multi-agent Deep Reinforcement Learning based intelligent solution for both horizontal and vertical scaling of function resources, based on a comprehensive understanding on both function and system requirements. Our solution elevates function performance reducing cold starts, while also offering the flexibility for optimizing resource maintenance cost to the service providers. Experiments conducted considering varying workload scenarios show improvements of up to 23% and 34% in terms of application latency and request failures, or alternatively saving up to 45% in infrastructure cost for the service providers.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"173 ","pages":"Article 107873"},"PeriodicalIF":6.2,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144069927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leonardo Vianna do Nascimento , José Palazzo Moreira de Oliveira
{"title":"A multi-agent architecture for context sources integration in smart cities","authors":"Leonardo Vianna do Nascimento , José Palazzo Moreira de Oliveira","doi":"10.1016/j.future.2025.107862","DOIUrl":"10.1016/j.future.2025.107862","url":null,"abstract":"<div><div>Contextual data in smart cities are present in large quantities and distributed sources. Many applications can benefit from these data to provide better services to their users. The scale and dynamic nature of urban environments pose significant challenges in making context sources available to applications. These challenges involve transparent access to context, resilience, decentralization, extensibility, scalability, and redundancy of data. This study introduces a new architecture designed to address these issues. This architecture aims to facilitate the acquisition of context by integrating distributed data sources. The developed architecture not only overcomes the challenges posed by the scale and dynamicity of urban environments but also prepares for more innovative and effective solutions for smart cities. The architecture is distributed, decentralized, and fault-tolerant, providing data fusion mechanisms and dynamic context source composition. Compared to existing works, our architecture contributes to the state-of-the-art addressing all these five challenges in one design. The architecture uses the multi-agent paradigm, which is inherently distributed and facilitates decentralization. A scenario was used to execute several experiments demonstrating that the architecture can obtain context data transparently by any application.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107862"},"PeriodicalIF":6.2,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient edge-based data integrity auditing in cloud storage","authors":"Hao Yan , Yan Wang , Guoxiu Liu , Juan Zhao","doi":"10.1016/j.future.2025.107899","DOIUrl":"10.1016/j.future.2025.107899","url":null,"abstract":"<div><div>Edge computing increasingly collaborates with cloud computing to support numerous applications that involve large data volumes and frequent data interactions. In cloud-edge collaboration environments, applications especially with high requirements for low data transmission delay often deploy frequently accessed client data replicas on edge servers to improve data access efficiency. Consequently, client data is often distributed across both cloud and edge servers in practice. Therefore, efficiently verifying the integrity of all client data poses a complex and urgent challenge. To address this issue, the paper introduces a novel data integrity auditing scheme capable of efficiently performing asynchronous integrity checks on client data across both edge and cloud servers. In our scheme, clients only generate partial block tags and upload them along with the data to the edge server. Edge server computes complete tags based on the partial tags, caches a small portion of frequently accessed data, and transfers the remaining data to the cloud server. For data verification, edge servers provide partial integrity proofs for cached data, supporting the cloud server to generate complete proofs for all challenged data. Thus, the auditors can verify all client data, regardless of its storage location. In our scheme, edge clients bear only about half of the computational workload of existing schemes. Additionally, the cloud server also offloads a portion of computational and storage tasks to edge servers, significantly improving the overall efficiency of data checking. We theoretically prove the security of our scheme, and experimental results demonstrate its efficiency and feasibility.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107899"},"PeriodicalIF":6.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143936896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving self-supervised vertical federated learning with contrastive instance-wise similarity and dynamical balance pool","authors":"Shuai Chen , Wenyu Zhang , Xiaoling Huang , Cheng Zhang , Qingjun Mao","doi":"10.1016/j.future.2025.107884","DOIUrl":"10.1016/j.future.2025.107884","url":null,"abstract":"<div><div>Vertical Federated Learning (VFL) enables multiple parties with distinct feature spaces to train a joint VFL model collaboratively without exposing their original private data. In realistic scenarios, the scarcity of aligned and labeled samples among collaborating participants limits the effectiveness of traditional VFL approaches for model training. Current VFL frameworks attempt to leverage abundant unlabeled data using Contrastive Self-Supervised Learning (CSSL). However, the simplistic incorporation of CSSL methods cannot address severe domain shift in VFL. In addition, CSSL methods typically conflict with general regularization approaches designed to alleviate domain shift, thereby significantly limiting the potential of the self-supervised learning framework in VFL. To address these challenges, this study proposes an Improved Self-Supervised Vertical Federated Learning (ISSVFL) framework for VFL in label-scarce scenarios under the semi-honest and no-collusion assumption. ISSVFL merges CSSL with instance-wise similarity to resolve regularization conflicts and captures more significant inter-domain knowledge in the representations from different participants, effectively alleviating domain shift. In addition, a new dynamical balance pool is proposed to fine-tune the pre-trained models for downstream supervised tasks by dynamically balancing inter-domain and intra-domain knowledge. Extensive empirical experiments on image and tabular datasets demonstrate that ISSVFL achieves an average performance improvement of 3.3 % compared with state-of-the-art baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107884"},"PeriodicalIF":6.2,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143931576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sangmyung Lee , Byungyoon Lee , Yongseok Son , Kiwook Sohn , Hwajung Kim , Sunggon Kim
{"title":"AS2: Adaptive sorting algorithm selection for heterogeneous workloads and systems","authors":"Sangmyung Lee , Byungyoon Lee , Yongseok Son , Kiwook Sohn , Hwajung Kim , Sunggon Kim","doi":"10.1016/j.future.2025.107860","DOIUrl":"10.1016/j.future.2025.107860","url":null,"abstract":"<div><div>Sorting is becoming increasingly important in modern computing, ranging from small-scale Internet of Things (IoT) devices to supercomputers. To improve sorting performance, various algorithms, including Intro sort, Merge sort, Heap sort, and Insertion sort, are adopted in different systems. However, the performance of sorting algorithms depends on various factors, and our analysis shows that the optimal algorithm varies, with no single algorithm consistently outperforming the others. In this paper, we first analyze data internal factors (data size, distribution, data type) and external factors (threads, different hardware) that impact sorting algorithm performance. We utilize widely adopted sorting algorithms such as STL sort and Merge sort, as well as state-of-the-art sorting algorithms like Ips4o sort and Aips2o sort. In addition to sequential sorting algorithms, we implement Parallel Intro sort and utilize the parallel versions of state-of-the-art sorting algorithms with varying number of threads. From the analysis, we present an adaptive sorting algorithm selection model for heterogeneous workloads and systems, called AS2 (Adaptive Sorting Algorithm Selection). Its goal is to determine the optimal algorithm from the existing sorting algorithms in heterogeneous workloads and systems. AS2 uses various ML models to build performance models for each sorting algorithm using data internal and external factors from various datasets. Then, AS2 chooses the optimal sorting algorithm based on the performance prediction using the model. We evaluate AS2 using a representative dataset that includes various data internal and external factors. The results show that AS2 can accurately predict the performance of various sorting algorithms, with min and max r-squared values of 0.83 and 0.99, respectively. In addition, AS2 successfully selects the optimal algorithm in our evaluation scenario up to 99.68% accuracy by choosing the algorithm with the shortest predicted sorting time, improving performance by up to 1.83<span><math><mo>×</mo></math></span> compared to the state-of-the-art algorithm. We also evaluate the performance of AS2 using the real-world dataset and the results show that AS2 selects the optimal algorithm with 87.50% accuracy.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107860"},"PeriodicalIF":6.2,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143918585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Li , Shunmei Meng , Jin Sun , Zhicheng Cai , Qianmu Li , Xuyun Zhang
{"title":"Multi-agent deep reinforcement learning based multi-task partial computation offloading in mobile edge computing","authors":"Han Li , Shunmei Meng , Jin Sun , Zhicheng Cai , Qianmu Li , Xuyun Zhang","doi":"10.1016/j.future.2025.107861","DOIUrl":"10.1016/j.future.2025.107861","url":null,"abstract":"<div><div>Mobile edge computing (MEC) can enhance the computation performance of end-devices by providing computation offloading service at the network edge. However, given that both end-devices and edge servers have finite computation resources, inefficient offloading policies may lead to overload, thereby increasing the computation delays of tasks. In this paper, we investigate a multi-task partial computation offloading problem combined with a queue model. Based on achieving load-balancing across the MEC system, our objective is to minimize the long-standing average task-processing cost of the end-devices while ensuring the delay thresholds of tasks. For this purpose, a distributed offloading algorithm utilizing the multi-agent deep reinforcement learning (MADRL) method is proposed. Specifically, through interacting with the MEC environment and accumulating experience data, the device agents can collaborate to optimize their local offloading decisions over continuous time-slots, which includes adjusting the transmission power and determining the tasks’ offloading ratios under the dynamic wireless channel conditions. Exhaustive experimental results demonstrate that in contrast with the baseline algorithms, the proposed offloading algorithm can not only better balance the computation loads between the end-devices and the MEC server, but also more effectively reduce the task-processing cost of the end-devices, as well as the percentage of timeout tasks.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107861"},"PeriodicalIF":6.2,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing quality-throughput trade-off in scoring functions for extreme-scale virtual screening campaigns","authors":"Yuedong Zhang, Gianmarco Accordi, Davide Gadioli, Gianluca Palermo","doi":"10.1016/j.future.2025.107863","DOIUrl":"10.1016/j.future.2025.107863","url":null,"abstract":"<div><div>Drug discovery is a long and costly process aimed at finding a molecule that yields a therapeutic effect. Virtual screening is one of the initial in-silico steps that aims at estimating how promising a molecule is. This stage needs to solve two well-known domain problems: molecular docking and scoring. While the accuracy of scoring functions is extensively investigated in comparisons, the execution time of their implementation is usually not considered. In virtual screening campaigns, the definition of a fixed time budget for the entire process and the average time required to process each molecule determines the upper limit of the number of molecules that can be evaluated. By reducing the time needed to evaluate a single molecule, we can screen a larger number of molecules, thereby increasing the possibility of finding a promising solution. For extreme-scale virtual screening campaigns, the computational budget is a critical aspect since even utilizing large-scale facilities would make it impractical to complete the screening within a feasible time unless the computational time for a single molecule is significantly reduced.</div><div>In this paper, we explore optimization and approximation techniques applied to two well-known scoring functions, which we modify to investigate different accuracy-performance trade-offs to support large-scale virtual screening campaigns. Despite the different approaches we considered, experimental results demonstrate that the proposed enhancements achieve better enrichment factors in virtual screening scenarios. Moreover, we port both implementations to CUDA to show that the proposed techniques are GPU-friendly and aligned with modern supercomputing infrastructures.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107863"},"PeriodicalIF":6.2,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143918584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyue Feng, Sijia Zhang, Tianzhe Jiao, Chaopeng Guo, Jie Song
{"title":"Adaptive container auto-scaling for fluctuating workloads in cloud","authors":"Xiaoyue Feng, Sijia Zhang, Tianzhe Jiao, Chaopeng Guo, Jie Song","doi":"10.1016/j.future.2025.107872","DOIUrl":"10.1016/j.future.2025.107872","url":null,"abstract":"<div><div>Database-as-a-Service(DBaaS) provides services for multiple tenants through resource containers, which are allowed to scale over time to fulfill the service-level agreements. Designing container auto-scaling methods for DBaaS can help reduce their expenditure. Reinforcement Learning (RL) shows powerful performance in cloud resource scaling due to its robustness in dynamic environments. However, the RL-based methods fail to maintain high performance for fluctuating workloads since their fixed-action design cannot adapt to numerous variations of the resource demand. This paper proposes an adaptive container auto-scaling method called Asner that includes an improved RL-based algorithm with a dynamic action model to solve the problem of fixed-action design. Asner consists of a resource estimation model (<em>Estimator</em>) and a RL-based scaling algorithm (<em>Scaler</em>). <em>Estimator</em> adopts a graph-based method to estimate the workload resource demand for container scaling. <em>Scaler</em> generates the container scaling strategy by employing an improved RL-based algorithm with a dynamic action model for adapting to the fluctuating workload. Our experiment results show that <em>Estimator</em> achieves about 93% accuracy under the TPC-DS dataset, <em>Scale</em>’s performance is about 30% higher than the state-of-the-art RL, and Asner improves its performance by up to 45% compared to other methods.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"172 ","pages":"Article 107872"},"PeriodicalIF":6.2,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}