Guozheng Wang , Dongxia Wang , Chengfan Li , Yongmei Lei
{"title":"The Fast Inertial ADMM optimization framework for distributed machine learning","authors":"Guozheng Wang , Dongxia Wang , Chengfan Li , Yongmei Lei","doi":"10.1016/j.future.2024.107575","DOIUrl":"10.1016/j.future.2024.107575","url":null,"abstract":"<div><div>The ADMM (Alternating Direction Method of Multipliers) optimization framework is known for its property of decomposition and assembly, which effectively bridges distributed computing and optimization algorithms, making it well-suited for distributed machine learning in the context of big data. However, it suffers from slow convergence speed and lacks the ability to coordinate worker computations, resulting in inconsistent speeds in solving subproblems in distributed systems and mutual waiting among workers. In this paper, we propose a novel optimization framework to address these challenges in support vector regression (SVR) and probit regression training through the FIADMM (<strong>F</strong>ast <strong>I</strong>nertial ADMM). The key concept of the FIADMM lies in the introduction of inertia acceleration and an adaptive subproblem iteration mechanism based on the ADMM, aimed at accelerating convergence speed and reducing the variance in solving speeds among workers. Further, we prove that FIADMM has a fast linear convergence rate <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>/</mo><mi>k</mi><mo>)</mo></mrow></mrow></math></span>. Experimental results on six benchmark datasets demonstrate that the proposed FIADMM significantly enhances convergence speed and computational efficiency compared to multiple baseline algorithms and related efforts.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107575"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haijing Luan , Kaixing Yang , Taiyuan Hu , Jifang Hu , Siyao Liu , Ruilin Li , Jiayin He , Rui Yan , Xiaobing Guo , Niansong Qian , Beifang Niu
{"title":"Review of deep learning-based pathological image classification: From task-specific models to foundation models","authors":"Haijing Luan , Kaixing Yang , Taiyuan Hu , Jifang Hu , Siyao Liu , Ruilin Li , Jiayin He , Rui Yan , Xiaobing Guo , Niansong Qian , Beifang Niu","doi":"10.1016/j.future.2024.107578","DOIUrl":"10.1016/j.future.2024.107578","url":null,"abstract":"<div><div>Pathological diagnosis is considered the gold standard in cancer diagnosis, playing a crucial role in guiding treatment decisions and prognosis assessment for patients. However, achieving accurate diagnosis of pathology images poses several challenges, including the scarcity of pathologists and the inherent subjective variability in their interpretations. The advancements in whole-slide imaging technology and deep learning methods provide new opportunities for digital pathology, especially in low-resource settings, by enabling effective pathological image classification. In this article, we begin by introducing the datasets, which include both unimodal and multimodal types, as essential resources for advancing pathological image classification. We then provide a comprehensive overview of deep learning-based pathological image classification models, covering task-specific models such as supervised, unsupervised, weakly supervised, and semi-supervised learning methods, as well as unimodal and multimodal foundation models. Next, we review tumor-related indicators that can be predicted from pathological images, focusing on two main categories: indicators that can be recognized by pathologists, such as tumor classification, grading, and region recognition; and those that cannot be recognized by pathologists, including molecular subtype prediction, tumor origin prediction, biomarker prediction, and survival prediction. Finally, we summarize the key challenges in digital pathology and propose potential future directions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107578"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning protein language contrastive models with multi-knowledge representation","authors":"Wenjun Xu , Yingchun Xia , Bifan Sun , Zihao Zhao , Lianggui Tang , Xiaobo Zhou , Qingyong Wang , Lichuan Gu","doi":"10.1016/j.future.2024.107580","DOIUrl":"10.1016/j.future.2024.107580","url":null,"abstract":"<div><div>Protein representation learning plays a crucial role in obtaining a comprehensive understanding of biological regulatory mechanisms and in developing proteins and drugs for therapeutic purposes. However, labeled proteins, such as sequenced and functionally annotated data, are incomplete and few. Thus, contrastive learning has emerged as the preferred technique for learning meaningful representations from unlabeled data samples. In addition, at present, natural proteins cannot be fully described by extracting protein knowledge from a single domain. Therefore, Pro-CoRL, a <u>pro</u>tein <u>co</u>ntrastive models framework based on multi-knowledge <u>r</u>epresentation <u>l</u>earning, was proposed in this study. In particular, Pro-CoRL smooths the objective function using convex approximation, thereby improving the stability of training. Extensive experiments on predicting protein–protein interaction types and clustering protein families have confirmed the high accuracy and robustness of Pro-CoRL.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107580"},"PeriodicalIF":6.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianlong Xu, Mengqing Jin, Jinze Xiao, Dianming Lin, Yuelong Liu
{"title":"Multi-round decentralized dataset distillation with federated learning for Low Earth Orbit satellite communication","authors":"Jianlong Xu, Mengqing Jin, Jinze Xiao, Dianming Lin, Yuelong Liu","doi":"10.1016/j.future.2024.107570","DOIUrl":"10.1016/j.future.2024.107570","url":null,"abstract":"<div><div>Satellite communication and Low Earth Orbit (LEO) satellites are important components of the 6G network, widely used for Earth observation tasks due to their low cost and short return period, making them a key technology for 6G network connectivity. Due to limitations in satellite system technology and downlink bandwidth, it is not feasible to download all high-resolution image information to ground stations. Even in existing federated learning (FL) methods, sharing well-trained parts of the model can still bottleneck with increasing model size. To address these challenges, we propose a new federated learning framework (FL-M3D) for LEO satellite communication that employs multi-round decentralized dataset distillation techniques. It allows satellites to independently extract local datasets and transmit them to ground stations instead of exchanging model parameters. Communication costs depend only on the size of the synthesized dataset and do not increase with larger models. However, the heterogeneity of satellite datasets can lead to sample ambiguity and decreased model convergence speed. Therefore, we propose distilling the datasets to mitigate the negative effects of data heterogeneity. Through experiments using real-world image datasets, FL-M3D reduces communication volume in simulated satellite networks by approximately 49.84% and achieves improved model performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107570"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud-based solution for urbanization monitoring using satellite images","authors":"Ion-Dorinel Filip , Cristian Cune , Florin Pop","doi":"10.1016/j.future.2024.107579","DOIUrl":"10.1016/j.future.2024.107579","url":null,"abstract":"<div><div>Motivated by the large amount of available satellite data and increasing interest in the study of urbanization, this paper presents a way for better supervision of urbanization, as more and more people are looking to increase their quality of life by migrating to urban areas. This project is particularly useful for environmental researchers or citizens who are looking to make informed decisions. This project utilizes Sentinel Hub, a multi-spectral satellite imagery cloud service, to access Sentinel 2 data to detect changes in Romania’s urban environment automatically. Sentinel Hub’s spectral bands, which describe the reflectance properties of a surface, are used to compute spectral indices that highlight patterns in satellite images. The paper analyzes two urban indices that successfully map build-up regions and a vegetation index that assesses the degree of vegetation in an urbanized area. It employs different methods to enhance each index and evaluates its performance in a town that has seen rapid urban expansion.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107579"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CBWO: A Novel Multi-objective Load Balancing Technique for Cloud Computing","authors":"Vahideh Hayyolalam, Öznur Özkasap","doi":"10.1016/j.future.2024.107561","DOIUrl":"10.1016/j.future.2024.107561","url":null,"abstract":"<div><div>In cloud computing systems, the growing demand for diverse applications has led to challenges in resource allocation and workload distribution, resulting in increased energy consumption and computational costs. To address these challenges, we propose a novel load-balancing method, namely CBWO, that integrates Chaos theory with the Black Widow Optimization algorithm. Our approach is designed to optimize cloud computing environments by improving energy efficiency and resource utilization. We employ CloudSim for simulations, evaluating key performance metrics such as energy consumption, resource utilization, makespan, task completion time, and imbalance degree. The experimental results demonstrate the superiority of our method, achieving average improvements of 67.28% in makespan and 29.03% in energy consumption compared to existing solutions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107561"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lifeng Yan , Zekun Yin , Tong Zhang , Fangjin Zhu , Xiaohui Duan , Bertil Schmidt , Weiguo Liu
{"title":"SWQC: Efficient sequencing data quality control on the next-generation sunway platform","authors":"Lifeng Yan , Zekun Yin , Tong Zhang , Fangjin Zhu , Xiaohui Duan , Bertil Schmidt , Weiguo Liu","doi":"10.1016/j.future.2024.107577","DOIUrl":"10.1016/j.future.2024.107577","url":null,"abstract":"<div><div>Sequencing data quality control can significantly prevent low-quality data from impacting downstream applications in bioinformatics. The enormous growth of biological sequencing data in recent years introduces new challenges to the efficiency of quality control processes and motivates the need for fast implementations on modern compute systems. The powerful next-generation heterogeneous Sunway platform holds significant potential for addressing this challenge. However, there are currently no dedicated quality control applications that can fully utilize its computational power. To bridge this gap, we introduce SWQC, a novel quality control application specifically designed for the Sunway platform. We present an efficient distributed FASTQ I/O framework for Sunway-based workstations and supercomputers to take advantage of fast SSDs and the parallel file system. In order to support both process-level and thread-level (CPE-level) parallelism to leverage the computational power, we refactor and optimize all standard quality control modules for the heterogeneous Sunway architecture. When using a single node, SWQC achieves speedups between 2 and 40 over highly optimized quality control applications executed on a high-end 48-core AMD server. Additionally, when using 16 nodes, SWQC achieves parallel efficiencies of 70% (for reading and writing a single file) and 95% (for reading one file and writing split files) compared to a single node. Overall, SWQC is able to perform quality control operations for a 140GB FASTQ file within only 70 s using a single Sunway node. It is publicly available at <span><span>https://github.com/RabbitBio/SWQC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107577"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fatemeh Khoda Parast , Seyed Alireza Damghani , Brett Kelly , Yang Wang , Kenneth B. Kent
{"title":"Efficient security interface for high-performance Ceph storage systems","authors":"Fatemeh Khoda Parast , Seyed Alireza Damghani , Brett Kelly , Yang Wang , Kenneth B. Kent","doi":"10.1016/j.future.2024.107571","DOIUrl":"10.1016/j.future.2024.107571","url":null,"abstract":"<div><div>Ceph portrays a resilient clustered storage solution with supporting object, block, and file storage capabilities with no single point of failure. Despite these qualifications, data confidentiality defines a concern in the system, as authentication and access control are the only data protection security services in Ceph. CephArmor was proposed as a third-party security interface to protect data confidentiality by adding an extra protection layer to data at rest. Despite the added layer, the initial design of the API needed to be more efficient in addressing security and performance simultaneously. In this study, we propose a new architectural design to address the associated issues with the preliminary prototype. Comprehensive performance and security analysis verify the improvement of the proposed method compared to the initial approach. The benchmark result has indicated a 37% improvement on average in IOPS, elapsed time, and bandwidth for the <em>write</em> benchmark compared to the initial model.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107571"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of IoT perceived content caching in F-RANs: Minimum retrieval delay and resource extension with performance sensitivity","authors":"Chia-Cheng Hu","doi":"10.1016/j.future.2024.107572","DOIUrl":"10.1016/j.future.2024.107572","url":null,"abstract":"<div><div>In the Internet of Things (IoT) perceived applications of monitoring the states of the environment, a feasible technology is to use fog radio access networks (F-RANs) to alleviate the problems of long response time and cloud server bottlenecks in cloud computing. In response to the above problems, this work investigates the problem of minimizing the retrieval delay of IoT contents in F-RANs under the constraints of system resources. The problem is formulated as an integer linear programming (ILP) model. Then, a polynomial-time method with linear programming (LP) relaxation and rounding is proposed to approximate the optimal solution of the problem. Through proof, the method can obtain a feasible solution with a bounded approximation ratio in polynomial time. The conducted simulations validate that the obtained feasible solution is very close to the optimal one. On the other hand, when the system resources are not enough to meet the continuous growth of content retrieval and need to be expanded, this work further establishes an association relation between cached contents and system resources. Based on the above relation, the second method of expanding system resources with performance sensitivity is proposed to provide the service provider with an effective and economical expansion of system resources. It utilizes a predefined system parameter in balancing the trade-off between the approximation ratio to the optimal solution of the problem and the extended system resources. The solution obtained by the second method is also proved to have a bounded approximation ratio.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107572"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Carlos Hernandez-Hernandez , David Larrabeiti , Maria Calderon , Ignacio Soto , Bruno Cimoli , Hui Liu , Idelfonso Tafur Monroy
{"title":"Designing optimal Quantum Key Distribution Networks based on Time-Division Multiplexing of QKD transceivers: qTDM-QKDN","authors":"Juan Carlos Hernandez-Hernandez , David Larrabeiti , Maria Calderon , Ignacio Soto , Bruno Cimoli , Hui Liu , Idelfonso Tafur Monroy","doi":"10.1016/j.future.2024.107557","DOIUrl":"10.1016/j.future.2024.107557","url":null,"abstract":"<div><div>Time-sharing of Quantum Key Distribution (QKD) transceivers with the help of optical switches and a central Software-Defined Networking (SDN) controller is a promising technique to better amortize the large investments required to build a Quantum Key Distribution Network (QKDN). In this work, we investigate the implications of introducing Time-Division Multiplexing (TDM) in trusted-relay QKDNs at the wide-area network scale in terms of performance and cost-saving. To this end, we developed both a Mixed Integer Linear Programming (qTDM-MILP) model and a Heuristic Algorithm (qTDM-HA) to solve the allocation of QKD transceivers and network resources for a novel switched QKDN operating scheme: qTDM-QKDN. Our heuristic method provides a close-to-optimal resource planning for the offline problem that computes the minimum number of QKD transceivers and optical switch ports at each node, as well as the number of quantum channels on each link required to satisfy a target set of end-to-end secret-keyrate demands. Moreover, both the model and the heuristic provide the time fractions that each QKD transceiver needs to peer with each neighbor QKD transceiver. We compared our proposed model and heuristic algorithm for cost minimization with non-time sharing QKD transceivers (nTDM) as baseline. The results show that qTDM can achieve substantial cost-savings in the range of 10%–40% compared to nTDM. Furthermore, this work sheds light on the selection of the value for the working cycle <span><math><mi>T</mi></math></span> and its influence on network performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107557"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}