Michael Wilkins;Yanfei Guo;Rajeev Thakur;Peter Dinda;Nikos Hardavellas
{"title":"Practical Machine Learning Autotuning for Large-Scale Collective Communication","authors":"Michael Wilkins;Yanfei Guo;Rajeev Thakur;Peter Dinda;Nikos Hardavellas","doi":"10.1109/TPDS.2026.3661876","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3661876","url":null,"abstract":"Collective communication is a fundamental communication model for parallel computing on distributed memory systems. The performance of a collective operation depends on the underlying algorithm. Machine learning (ML)–based autotuners can optimize algorithm selection to enhance collective performance. Previous approaches are impractical for production use at large scale, however, due to prohibitively long training times that exceed typical job durations. This paper introduces ACCLAiM (Advancing Collective Communication Autotuning using Machine Learning), the first ML-based collective algorithm selection autotuner capable of accelerating production applications on large-scale supercomputers. ACCLAiM incorporates several improvements over prior designs in training point selection, handling of non-power-of-two feature values, model validation, and data collection. Our approach leverages variance-based active learning alongside topology-aware benchmark parallelization to eliminate unnecessary training points and maximize machine utilization, thereby significantly reducing training time. We present ACCLAiM as an open-source prototype and provide a comprehensive experimental evaluation. We demonstrate how each of ACCLAiM’s enhancements contributes to a substantial reduction in training time compared with the previous state-of-the-art approach, cumulatively reducing the training time to 5–10 minutes even at large scale. We demonstrate ACCLAiM’s usefulness on two leadership-class supercomputers and showcase its practical benefits for applications, achieving speedups up to 4.1x.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1032-1047"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xianzhong Ding;Yunkai Zhang;Binbin Chen;Donghao Ying;Tieying Zhang;Jianjun Chen;Lei Zhang;Alberto Cerpa;Wan Du
{"title":"Scalable and Efficient Reinforcement Learning for Virtual Machine Rescheduling in Cloud Data Centers","authors":"Xianzhong Ding;Yunkai Zhang;Binbin Chen;Donghao Ying;Tieying Zhang;Jianjun Chen;Lei Zhang;Alberto Cerpa;Wan Du","doi":"10.1109/TPDS.2026.3674891","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3674891","url":null,"abstract":"Managing a vast number of virtual machines (VMs) efficiently is a critical challenge in modern large-scale data centers. The continuous creation and termination of VMs lead to resource fragmentation across physical machines (PMs), necessitating periodic VM rescheduling to optimize resource utilization. Despite its significance, VM rescheduling has received limited attention in the literature. A key challenge is that, unlike conventional combinatorial optimization problems, the efficiency of rescheduling algorithms is heavily impacted by inference time, as VM states evolve dynamically during execution. This scalability bottleneck hampers existing methods. To address this, we propose VMR<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>L, a reinforcement learning framework tailored for VM rescheduling. VMR<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>L integrates a two-stage decision-making process to accommodate complex operational constraints, a feature extraction mechanism that captures critical relational information for rescheduling, and a risk-aware evaluation strategy that enables users to balance execution speed and rescheduling accuracy. Extensive experiments using real-world data from a production-scale data center demonstrate that VMR<inline-formula><tex-math>$^{2}$</tex-math></inline-formula>L achieves near-optimal performance while reducing inference time to a matter of seconds.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1186-1204"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147665392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unlocking SSD Parallelism: A High-Performance B$^epsilon$ε-Tree Framework for OCSSDs","authors":"Chi-Liang Qiu;Yao-Yu Liao;Tseng-Yi Chen;Yuan-Hao Chang","doi":"10.1109/TPDS.2026.3672185","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3672185","url":null,"abstract":"Key-value stores have become pivotal in the management of data for modern large-scale data centers. Unlike the log-structured merge (LSM) tree, the B<inline-formula><tex-math>$^epsilon$</tex-math></inline-formula> tree enhances read performance by mitigating read amplification and by leveraging the temporal locality of keys in the buffer area of its internal nodes. However, its inability to fully harness the high parallelism of solid-state drives (SSDs) limits its effectiveness. This limitation stems from traditional SSDs concealing their parallelism from the host system. The advent of open-channel SSDs (OCSSDs), which expose the physical data storage layout to the host system, provides a unique opportunity to leverage SSDs’ inherent high parallelism in read/write operations. This article introduces a novel high-parallelism B<inline-formula><tex-math>$^epsilon$</tex-math></inline-formula> (HP-B<inline-formula><tex-math>$^epsilon$</tex-math></inline-formula>) indexing scheme designed for OCSSDs. The scheme specifically addresses conflicts between parallel units (PUs) during read operations, substantially improving read performance of the B<inline-formula><tex-math>$^epsilon$</tex-math></inline-formula> indexing approach. To our knowledge, this is the first study to adapt the B<inline-formula><tex-math>$^epsilon$</tex-math></inline-formula> indexing scheme for the OCSSD architecture, and our experimental results are promising.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1107-1120"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"fPIM: A Holistic Design to Optimize PIM Data Flow for High Execution Efficiency","authors":"Nan Wang;Wenjie Liu;Qing Liu;Xubin He","doi":"10.1109/TPDS.2026.3670145","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3670145","url":null,"abstract":"As applications demand more bandwidth, the “memory wall” problem becomes increasingly severe. Therefore, the processing-in-memory (PIM) architecture has attracted significant research interest due to its ability to execute instructions offloaded by the processor. Existing works on PIM architectures are classified into two categories: regional offloading, where all instructions within a programmer-specified code region are offloaded, and selective offloading, where only instructions of interest are offloaded via hardware support. However, PIM architectures pose the amplified in-PIM traffic overhead challenge that endangers the performance of PIM and degrades the performance of the entire system. To address the challenge, we propose a PIM architecture, called fast PIM (fPIM), which integrates the PIM cache within each Channel Controller to optimize the data flow within the PIM. This design cooperates with the <italic>Processing Unit Load-balancer</i> and <italic>Behavior-based Offloader</i> to achieve high execution efficiency. To evaluate <italic>fPIM</i>, we perform extensive experiments, and the results show that <italic>fPIM</i> reduces the workload finish time by up to 88.6%, 87.5%, and 79.6% (with an average of 68.7%, 66.2%, and 59.8%), compared to three state-of-the-art PIM designs, PEI, Fafnir, and SpaceA, respectively.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1096-1106"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Wang;Zhiquan Lai;Dongsheng Li;Shengwei Li;Weijie Liu;Keshi Ge;Ao Shen;Huayou Su
{"title":"Pro-Prophet: A Systematic Load Balancing Method for Efficient Parallel Training of Large-Scale MoE Models","authors":"Wei Wang;Zhiquan Lai;Dongsheng Li;Shengwei Li;Weijie Liu;Keshi Ge;Ao Shen;Huayou Su","doi":"10.1109/TPDS.2026.3668887","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3668887","url":null,"abstract":"The size of deep learning models has been increasing to enhance model quality. The linear increase in training computation budgets with model size means that training an extremely large-scale model is exceedingly time-consuming. Recently, the Mixture of Experts (MoE) has drawn significant attention as it can scale models to extra-large sizes with a near-stable computation budget. However, inefficient distributed training of large-scale MoE models hinders their broader application. Specifically, a considerable dynamic load imbalance occurs among devices during training, significantly reducing throughput. Several load-balancing works have been proposed to address the challenge. System-level solutions draw more attention for their hardware affinity and non-disruption of model convergence compared to algorithm-level ones. However, they are troubled by high communication costs and poor communication-computation overlap. To address these challenges, we propose a systematic load-balancing method, Pro-Prophet, which consists of a planner and a scheduler for efficient parallel training of large-scale MoE models. To adapt to the dynamic load imbalance, we have profiled training statistics and utilized them to design Pro-Prophet. For lower communication volume, the Pro-Prophet planner determines a series of lightweight load-balancing strategies and efficiently searches for a communication-efficient one for training based on the statistics. For sufficient overlapping of communication and computation, the Pro-Prophet scheduler schedules the data-dependent operations based on the statistics and operation characteristics, further improving the training throughput. We conduct extensive experiments in various clusters and MoE models. The results indicate that Pro-Prophet achieves up to 2.66x speedup on MoE-GPT models compared to two popular MoE frameworks, namely Deepspeed-MoE and FasterMoE. Furthermore, Pro-Prophet has demonstrated a load-balancing improvement of up to 11.01x and speedups on modern MoE models up to 1.22x compared to a representative load-balancing work, FasterMoE.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1121-1135"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"cuFalcon: An Adaptive Parallel GPU Implementation for High-Performance Falcon Acceleration","authors":"Wenqian Li;Hanyu Wei;Shiyu Shen;Hao Yang;Wangchen Dai;Yunlei Zhao","doi":"10.1109/TPDS.2026.3675891","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3675891","url":null,"abstract":"The rapid advancement of quantum computing has ushered in a new era of post-quantum cryptography, urgently demanding quantum-resistant digital signatures to secure modern communications and transactions. Among NIST-standardized candidates, Falcon stands out because it is a compact lattice-based signature scheme suitable for size-sensitive applications. In this paper, we present cuFalcon, a high-throughput GPU implementation of Falcon that addresses its computational bottlenecks through adaptive parallel strategies. At the operational level, we optimize Falcon key components for GPU architectures through memory-efficient FFT, adaptive parallel <bold>ffSampling</b>, and a compact computation mode. For signature-level optimization, to improve scalability across different GPU architectures, we implement three versions of cuFalcon: the raw key version, the expanded key version, and the balanced version. Additionally, we design batch processing, streaming mechanisms, and memory pooling to handle multiple signature tasks efficiently. Ultimately, performance evaluations show significant improvements, with the raw key version achieving 172 k signatures per second and the expanded key version reaching 201 k. Compared to the raw key version, the balanced version achieves a 7% improvement in throughput, while compared to the expanded key version, it reduces memory usage by 70%. Furthermore, our raw key version implementation outperforms the reference implementation by 36.74 × and achieves a 2.71× speedup over the state-of-the-art GPU implementation.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1153-1167"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147606217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud Uptime Archive: Open-Access Availability Data of Web, Cloud, and Gaming Services","authors":"Sacheendra Talluri;Dante Niewenhuis;Xiaoyu Chu;Jakob Kyselica;Mehmet Cetin;Alexander Balgavy;Alexandru Iosup","doi":"10.1109/TPDS.2026.3673519","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3673519","url":null,"abstract":"Cloud services are critical to society. However, their reliability is poorly understood. Towards solving the problem, we propose a standard repository for cloud uptime data. We populate this repository with the data we collect containing failure reports from users and operators of cloud services, web services, and online games. The multiple vantage points help reduce bias from individual users and operators. We compare our new data to existing failure data from the Failure Trace Archive and the Google cluster trace. We analyze the MTBF and MTTR, time patterns, failure severity, user-reported symptoms, and operator-reported symptoms of failures in the data we collect. We observe that high-level user facing services fail less often than low-level infrastructure services, likely due to them using fault-tolerance techniques. We use simulation-based experiments to demonstrate the impact of different failure traces on the performance of checkpointing and retry mechanisms.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1136-1152"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11432997","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Query Evaluation for Highly-Frequent Earth Observation via Satellite Maneuver in Space Edge Computing","authors":"Guangyuan Xu;Zichuan Xu;Hao Wang;Haocheng Zhou;Peichen Liu;Guiqiang Zhang;Qiufen Xia","doi":"10.1109/TPDS.2026.3676395","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3676395","url":null,"abstract":"Big data analytics for Earth observation has been playing an increasingly important role in supporting environmental monitoring, disaster early warning, and sustainable development through timely analysis of massive multi-source data collected by satellites. With the growing need for such timely Big Data analysis, Space Edge Computing (SEC) networks have been proposed to provide in-orbit analytic services for users worldwide, by integrating computing capability through Low-Earth-Orbit (LEO) satellites. However, the monitoring frequency of LEO satellites over specific target areas remains limited due to orbital constraints, making it difficult to meet the high-frequency data acquisition demands of Big Data analytics. Satellite inclination maneuver is a promising method to cover a wide range of target areas and enhance monitoring frequency by adjusting orbital inclination of satellites. Although such maneuvering enables satellites to timely process datasets, reducing energy wastage caused by inter-satellite data transmission, it consumes propulsion fuel, which is limited and difficult to replenish in a timely manner. Therefore, balancing the energy consumed for data processing and the fuel consumed for maneuvering is essential for efficient and sustainable Big Data analytics in SEC networks. In this paper, we aim to optimize Big Data query evaluation problem with satellite maneuver in an SEC network, focusing on minimizing the weighted sum of energy consumed for processing and fuel consumed for maneuvering. Specifically, we consider that each Earth observation service needs to guarantee a certain level of monitoring frequency, which may not always be satisfied by the original orbital coverage of LEO satellites. In such cases, some satellites will be selected to perform inclination maneuvers for additional monitoring and processing to guarantee the quality of Earth observation services. To this end, we first propose an approximation algorithm with a provable approximation ratio for the offline query evaluation problem, which leverages a customized auxiliary graph to jointly minimize energy and fuel consumption. We then devise an online learning algorithm, referred to as the customized Lipschitz bandit learning algorithm, with a bounded regret for the online Big Data query evaluation problem in an SEC network. We finally evaluate the performance of the proposed algorithms in a real SEC network topology. Experiment results show that the performance of the proposed algorithms achieve 11% lower energy consumption and 10.6% lower fuel consumption than those of their comparison counterparts.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1168-1185"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147606215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flexible Synchronization Control for Accurate and Efficient Federated Learning","authors":"Zuo Gan;Chen Chen;Jiayi Zhang;Yifei Zhu;Jieru Zhao;Quan Chen;Minyi Guo","doi":"10.1109/TPDS.2026.3670216","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3670216","url":null,"abstract":"Federated Learning (FL) is a distributed paradigm that supports collaborated model training while preserving data privacy, where clients periodically synchronize their local gradients once after multiple local iterations. Due to non-uniform data distribution and poor network condition, FL processes often suffer degraded training accuracy and efficiency. In this work, we analyze the microscopic parameter variation behaviors in FL, and find that an effective method to improve FL accuracy is to switch to more frequent synchronization at proper moments. In particular, such frequency-tuning moments—which can be detected from gradient characteristics—are <italic>heterogeneous</i> across different parameters. Motivated by such observations, we propose Parameter-Adaptive Synchronization (PAS), a FL scheme that adaptively tunes the synchronization period for each scalar parameter. The benefits of PAS are two-fold: By switching to more frequent synchronization when necessary, we can improve the FL training accuracy; by synchronizing different parameters independently, we can enable communication-computation overlapping and enhance the network utilization. We have theoretically demonstrated the convergence validity of PAS, and have further extended it with adaptive sparsification capability to jointly reduce the overall communication volume. We implemented PAS atop PyTorch, and extensive experiments show that it can substantially improve FL performance in both accuracy and communication efficiency.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1079-1095"},"PeriodicalIF":6.0,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147557475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huihuang Qin;Shuangwu Chen;Zian Wang;Tao Zhang;Ziyang Zou;Xiaobin Tan;Shiyin Zhu;Jian Yang
{"title":"Reducing Cross-Pod Communication Overhead for MoE Model Training With Hybrid Parallelism in Multi-Tenant Clusters","authors":"Huihuang Qin;Shuangwu Chen;Zian Wang;Tao Zhang;Ziyang Zou;Xiaobin Tan;Shiyin Zhu;Jian Yang","doi":"10.1109/TPDS.2026.3668417","DOIUrl":"https://doi.org/10.1109/TPDS.2026.3668417","url":null,"abstract":"The massive parameter scale of sparsely-activated Mixture-of-Experts (MoE) models necessitates distributed training with hybrid parallelism. Placing such training tasks, i.e. mapping the logical partitions of an MoE model to available physical NPUs, is challenging. Due to the bandwidth and latency discrepancies between intra- and inter- Pods, the cross-Pod communication usually becomes a bottleneck. The high dispersion of NPUs in multi-tenant clusters exacerbates this issue further. However, a paucity of studies has considered the cross-Pod model placement problem. To address this challenge, we propose a novel model placement scheme tailored for MoE model training with hybrid parallelism in multi-tenant clusters. By quantifying the cross-Pod communication overhead incurred during MoE model training, the model placement is formulated as a 0-1 integer quadratic problem, which is NP hard. Motivated by the traffic difference between different parallelism, we decompose this problem into two subproblems. To solve the subproblems, we propose a lightweight two-stage algorithm based on Best-Fit strategy and neighborhood search. Experiments under different models and network topologies show that our model placement scheme can reduce cross-Pod traffic by 35.9% and cut communication time by 18.7% compared to state-of-the-art methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"37 5","pages":"1062-1078"},"PeriodicalIF":6.0,"publicationDate":"2026-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147440670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}