Yi Bian;Fangyu Zheng;Yuewu Wang;Lingguang Lei;Yuan Ma;Tian Zhou;Jiankuo Dong;Guang Fan;Jiwu Jing
{"title":"AsyncGBP${}^{+}$+: Bridging SSL/TLS and Heterogeneous Computing Power With GPU-Based Providers","authors":"Yi Bian;Fangyu Zheng;Yuewu Wang;Lingguang Lei;Yuan Ma;Tian Zhou;Jiankuo Dong;Guang Fan;Jiwu Jing","doi":"10.1109/TC.2024.3477987","DOIUrl":"https://doi.org/10.1109/TC.2024.3477987","url":null,"abstract":"The rapid evolution of GPUs has emerged as a promising solution for accelerating the worldwide used SSL/TLS, which faces performance bottlenecks due to its underlying heavy cryptographic computations. Nevertheless, substantial structural adjustments from the parallel mode of GPUs to the serial mode of the SSL/TLS stack are imperative, potentially constraining the practical deployment of GPUs. In this paper, we propose AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula>, a three-level framework that facilitates the seamless conversion of cryptographic requests from synchronous to asynchronous mode. We conduct an in-depth analysis of the OpenSSL provider and cryptographic primitive features relevant to GPU implementations, aiming to fully exploit the potential of GPUs. Notably, AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> supports three working settings (offline/online/hybrid), finely tailored for various public key cryptographic primitives, including traditional ones like X25519, Ed25519, ECDSA, and the quantum-safe CRYSTALS-Kyber. A comprehensive evaluation demonstrates that AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> can efficiently achieve an improvement of up to 137.8<inline-formula><tex-math>$times$</tex-math></inline-formula> compared to the default OpenSSL provider (for X25519, Ed25519, ECDSA) and 113.30<inline-formula><tex-math>$times$</tex-math></inline-formula> compared to OpenSSL-compatible <monospace>liboqs</monospace> (for CRYSTALS-Kyber) in a single-process setting. Furthermore, AsyncGBP<inline-formula><tex-math>${}^{+}$</tex-math></inline-formula> surpasses the current fastest commercial-off-the-shelf OpenSSL-compatible TLS accelerator with a 5.3<inline-formula><tex-math>$times$</tex-math></inline-formula> to 7.0<inline-formula><tex-math>$times$</tex-math></inline-formula> performance improvement.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"356-370"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing Privacy and Accuracy Using Significant Gradient Protection in Federated Learning","authors":"Benteng Zhang;Yingchi Mao;Xiaoming He;Huawei Huang;Jie Wu","doi":"10.1109/TC.2024.3477971","DOIUrl":"https://doi.org/10.1109/TC.2024.3477971","url":null,"abstract":"Previous state-of-the-art studies have demonstrated that adversaries can access sensitive user data by membership inference attacks (MIAs) in Federated Learning (FL). Introducing differential privacy (DP) into the FL framework is an effective way to enhance the privacy of FL. Nevertheless, in differentially private federated learning (DP-FL), local gradients become excessively sparse in certain training rounds. Especially when training with low privacy budgets, there is a risk of introducing excessive noise into clients’ gradients. This issue can lead to a significant degradation in the accuracy of the global model. Thus, how to balance the user's privacy and global model accuracy becomes a challenge in DP-FL. To this end, we propose an approach, known as differential privacy federated aggregation, based on significant gradient protection (DP-FedASGP). DP-FedASGP can mitigate excessive noises by protecting significant gradients and accelerate the convergence of the global model by calculating dynamic aggregation weights for gradients. Experimental results show that DP-FedASGP achieves comparable privacy protection effects to DP-FedAvg and cpSGD (communication-private SGD based on gradient quantization) but outperforms DP-FedSNLC (sparse noise based on clipping losses and privacy budget costs) and FedSMP (sparsified model perturbation). Furthermore, the average global test accuracy of DP-FedASGP across four datasets and three models is about \u0000<inline-formula><tex-math>$2.62$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$4.71$</tex-math></inline-formula>\u0000%, \u0000<inline-formula><tex-math>$0.45$</tex-math></inline-formula>\u0000%, and \u0000<inline-formula><tex-math>$0.19$</tex-math></inline-formula>\u0000% higher than the above methods, respectively. These improvements indicate that DP-FedASGP is a promising approach for balancing the privacy and accuracy of DP-FL.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"278-292"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou
{"title":"Collaborative Neural Architecture Search for Personalized Federated Learning","authors":"Yi Liu;Song Guo;Jie Zhang;Zicong Hong;Yufeng Zhan;Qihua Zhou","doi":"10.1109/TC.2024.3477945","DOIUrl":"https://doi.org/10.1109/TC.2024.3477945","url":null,"abstract":"Personalized federated learning (pFL) is a promising approach to train customized models for multiple clients over heterogeneous data distributions. However, existing works on pFL often rely on the optimization of model parameters and ignore the personalization demand on neural network architecture, which can greatly affect the model performance in practice. Therefore, generating personalized models with different neural architectures for different clients is a key issue in implementing pFL in a heterogeneous environment. Motivated by Neural Architecture Search (NAS), a model architecture searching methodology, this paper aims to automate the model design in a collaborative manner while achieving good training performance for each client. Specifically, we reconstruct the centralized searching of NAS into the distributed scheme called Personalized Architecture Search (PAS), where differentiable architecture fine-tuning is achieved via gradient-descent optimization, thus making each client obtain the most appropriate model. Furthermore, to aggregate knowledge from heterogeneous neural architectures, a knowledge distillation-based training framework is proposed to achieve a good trade-off between generalization and personalization in federated learning. Extensive experiments demonstrate that our architecture-level personalization method achieves higher accuracy under the non-iid settings, while not aggravating model complexity over state-of-the-art benchmarks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"250-262"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li
{"title":"A Heterogeneous and Adaptive Architecture for Decision-Tree-Based ACL Engine on FPGA","authors":"Yao Xin;Chengjun Jia;Wenjun Li;Ori Rottenstreich;Yang Xu;Gaogang Xie;Zhihong Tian;Jun Li","doi":"10.1109/TC.2024.3477955","DOIUrl":"https://doi.org/10.1109/TC.2024.3477955","url":null,"abstract":"Access Control Lists (ACLs) are crucial for ensuring the security and integrity of modern cloud and carrier networks by regulating access to sensitive information and resources. However, previous software and hardware implementations no longer meet the requirements of modern datacenters. The emergence of FPGA-based SmartNICs presents an opportunity to offload ACL functions from the host CPU, leading to improved network performance in datacenter applications. However, previous FPGA-based ACL designs lacked the necessary flexibility to support different rulesets without hardware reconfiguration while maintaining high performance. In this paper, we propose HACL, a heterogeneous and adaptive architecture for decision-tree-based ACL engine on FPGA. By employing techniques such as tree decomposition and recirculated pipeline scheduling, HACL can accommodate various rulesets without reconfiguring the underlying architecture. To facilitate the efficient mapping of different decision trees to memory and optimize the throughput of a ruleset, we also introduce a heterogeneous framework with a compiler in CPU platform for HACL. We implement HACL on a typical SmartNIC and evaluate its performance. The results demonstrate that HACL achieves a throughput exceeding 260 Mpps when processing 100K-scale ACL rulesets, with low hardware resource utilization. By integrating more engines, HACL can achieve even higher throughput and support larger rulesets.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"263-277"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enabling High Performance and Resource Utilization in Clustered Cache via Hotness Identification, Data Copying, and Instance Merging","authors":"Hongmin Li;Si Wu;Zhipeng Li;Qianli Wang;Yongkun Li;Yinlong Xu","doi":"10.1109/TC.2024.3477994","DOIUrl":"https://doi.org/10.1109/TC.2024.3477994","url":null,"abstract":"In-memory cache systems such as Redis provide low-latency and high-performance data access for modern internet services. However, in large-scale Redis systems, the workloads show strong skewness and varied locality, which degrades system performance and incurs low CPU utilization. Though there are many approaches toward load imbalance, the two-layered architecture of Redis makes its workload skewness show special characteristics. Redis first maps data into data groups, which is called <i>Group Mapping. Then the data groups are distributed to instances by Instance Mapping.</i> Under Redis's layered architecture, it gives rise to a small number of hot-spot instances with very limited hot data groups, as well as a large number of remaining cold instances. To improve Redis's performance and CPU utilization, it entails the accurate identification of instance and data group hotness, and handling hot data groups and cold instances. We propose HPUCache+ to address the hot-spot problem via hotness identification, hot data copying, and cold instance merging. HPUCache+ accurately and dynamically detects instance and data group hotness based on multiple resources and workload characteristics at low cost. It enables access to multiple data copies by dynamically updating the cached mapping in Redis client, achieving high user access performance with Redis client compatibility, while providing highly self-definable service level agreement. It also proposes an asynchronous instance merging strategy based on disk snapshots and temporal caches, which separates the massive data movement from the critical user access path to achieve high-performance instance merging. We implement HPUCache+ into Redis. Experiments show that, compared to the native Redis design, HPUCache+ achieves up to 2.3<inline-formula><tex-math>$times$</tex-math></inline-formula> and 3.5<inline-formula><tex-math>$times$</tex-math></inline-formula> throughput gains, 11.3<inline-formula><tex-math>$times$</tex-math></inline-formula> and 14.3<inline-formula><tex-math>$times$</tex-math></inline-formula> CPU utilization gains, respectively. It also achieves up to 50% less CPU and 75% less memory consumption compared to the state-of-the-art approach Anna.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"371-385"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungyong Lee;Sanghyun Lee;Minseok Seo;Chunmyung Park;Woojae Shin;Hyuk-Jae Lee;Hyun Kim
{"title":"NPC: A Non-Conflicting Processing-in-Memory Controller in DDR Memory Systems","authors":"Seungyong Lee;Sanghyun Lee;Minseok Seo;Chunmyung Park;Woojae Shin;Hyuk-Jae Lee;Hyun Kim","doi":"10.1109/TC.2024.3477981","DOIUrl":"https://doi.org/10.1109/TC.2024.3477981","url":null,"abstract":"Processing-in-Memory (PIM) has emerged as a promising solution to address the memory wall problem. Existing memory interfaces must support new PIM commands to utilize PIM, making the definition of PIM commands according to memory modes a major issue in the development of practical PIM products. For performance and OS-transparency, the memory controller is responsible for changing the memory mode, which requires modifying the controller and resolving conflicts with existing functionalities. Additionally, it must operate to minimize mode transition overhead, which can cause significant performance degradation. In this study, we present NPC, a memory controller designed for mode transition PIM that delivers PIM commands via the DDR interface. NPC issues PIM commands while transparently changing the memory mode with a dedicated scheduling policy that reduces the number of mode transitions with aggregative issuing. Moreover, existing functions, such as refresh, are optimized for PIM operation. We implement NPC in hardware and develop a PIM emulation system to validate it on FPGA platforms. Experimental results reveal that NPC is compatible with existing interfaces and functionality, and the proposed scheduling policy improves performance by 2.2<inline-formula><tex-math>$boldsymbol{times}$</tex-math></inline-formula> with balanced fairness, achieving up to 97% of the ideal performance. These findings have the potential to aid the application of PIM in real systems and contribute to the commercialization of mode transition PIM.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"1025-1039"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143388596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Han Zhao;Junxiao Deng;Weihao Cui;Quan Chen;Youtao Zhang;Deze Zeng;Minyi Guo
{"title":"Adaptive Kernel Fusion for Improving the GPU Utilization While Ensuring QoS","authors":"Han Zhao;Junxiao Deng;Weihao Cui;Quan Chen;Youtao Zhang;Deze Zeng;Minyi Guo","doi":"10.1109/TC.2024.3477995","DOIUrl":"https://doi.org/10.1109/TC.2024.3477995","url":null,"abstract":"The prosperity of machine learning applications has promoted the rapid development of GPU architecture. It continues to integrate more CUDA Cores, larger L2 cache and memory bandwidth within SM. Moreover, the GPU integrates Tensor Core dedicated to matrix multiplication. Although studies have shown that task co-location could effectively improve system throughput, existing works only focus on resource scheduling at the SM level and cannot improve resource utilization within the SM. In this paper, we propose Aker, a static kernel fusion and scheduling approach to improve resource utilization inside the SM while ensuring the QoS (Quality-of-Service) of co-located tasks. Aker consists of a static kernel fuser, a duration predictor for fused kernels, an adaptive fused kernel selector, and an enhanced QoS-aware kernel manager. The kernel fuser enables the static and flexible fusion for a kernel pair. The kernel pair could be Tensor Core kernel and CUDA Core kernel, or computing-prefer CUDA Core kernel and memory-prefer CUDA Core kernel. After the kernel fuser provides multiple fused kernel versions for a kernel pair, the duration predictor precisely predicts the duration of the fused kernels and the adaptive fused kernel selector locates the optimal fused kernel version. Finally, the kernel manager invokes the fused kernel or the original kernel based on the QoS headroom of latency-critical tasks to improve the system throughput. Our experimental results show that Aker improves the throughput of best-effort applications compared with state-of-the-art solutions by 50.1% on average, while ensuring the QoS of latency-critical tasks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 2","pages":"386-400"},"PeriodicalIF":3.6,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dependability of the K Minimum Values Sketch: Protection and Comparative Analysis","authors":"Jinhua Zhu;Zhen Gao;Pedro Reviriego;Shanshan Liu;Fabrizio Lombardi","doi":"10.1109/TC.2024.3475588","DOIUrl":"https://doi.org/10.1109/TC.2024.3475588","url":null,"abstract":"A basic operation in big data analysis is to find the cardinality estimate; to estimate the cardinality at high speed and with a low memory requirement, data sketches that provide approximate estimates, are usually used. The K Minimum Value (KMV) sketch is one of the most popular options; however, soft errors on memories in KMV may substantially degrade performance. This paper is the first to consider the impact of soft errors on the KMV sketch and to compare it with HyperLogLog (HLL), another widely used sketch for cardinality estimate. Initially, the operation of KMV in the presence of soft errors (so its dependability) in the memory is studied by a theoretical analysis and simulation by error injection. The evaluation results show that errors during the construction phase of KMV may cause large deviations in the estimate results. Subsequently, based on the algorithmic features of the KMV sketch, two protection schemes are proposed. The first scheme is based on using a single parity check (SPC) to detect errors and reduce their impact on the cardinality estimate; the second scheme is based on the incremental property of the memory list in KMV. The presented evaluation shows that both schemes can dramatically improve the performance of KMV, and the SPC scheme performs better even though it requires more memory footprint and overheads in the checking operation. Finally, it is shown that soft errors on the unprotected KMV produce larger worst-case errors than in HLL, but the average impact of errors is lower; also, the protected KMV using the proposed schemes are more dependable than HLL with existing protection techniques.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"210-221"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance and Environment-Aware Advanced Driving Assistance Systems","authors":"Sreenitha Kasarapu;Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/TC.2024.3475572","DOIUrl":"https://doi.org/10.1109/TC.2024.3475572","url":null,"abstract":"In autonomous and self-driving vehicles, visual perception of the driving environment plays a key role. Vehicles rely on machine learning (ML) techniques such as deep neural networks (DNNs), which are extensively trained on manually annotated databases to achieve this goal. However, the availability of training data that can represent different environmental conditions can be limited. Furthermore, as different driving terrains require different decisions by the driver, it is tedious and impractical to design a database with all possible scenarios. This work proposes a semi-parametric approach that bypasses the manual annotation required to train vehicle perception systems in autonomous and self-driving vehicles. We present a novel “Performance and Environment-aware Advanced Driving Assistance Systems” which employs one-shot learning for efficient data generation using user action and response in addition to the synthetic traffic data generated as Pareto optimal solutions from one-shot objects using a set of generalization functions. Adapting to the driving environments through such optimization adds more robustness and safety features to autonomous driving. We evaluate the proposed framework on environment perception challenges encountered in autonomous driving assistance systems. To accelerate the learning and adapt in real-time to perceived data, a novel deep learning-based Alternating Direction Method of Multipliers (dlADMM) algorithm is introduced to improve the convergence capabilities of regular machine learning models. This methodology optimizes the training process and makes applying the machine learning model to real-world problems more feasible. We evaluated the proposed technique on AlexNet and MobileNetv2 networks and achieved more than 18\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000 speedup. By making the proposed technique behavior-aware we observed performance of upto 99% while detecting traffic signals.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"131-142"},"PeriodicalIF":3.6,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sketch-Based Adaptive Communication Optimization in Federated Learning","authors":"Pan Zhang;Lei Xu;Lin Mei;Chungen Xu","doi":"10.1109/TC.2024.3475578","DOIUrl":"https://doi.org/10.1109/TC.2024.3475578","url":null,"abstract":"In recent years, cross-device federated learning (FL), particularly in the context of Internet of Things (IoT) applications, has demonstrated its remarkable potential. Despite significant efforts, empirical evidence suggests that FL algorithms have yet to gain widespread practical adoption. The primary obstacle stems from the inherent bandwidth overhead associated with gradient exchanges between clients and the server, resulting in substantial delays, especially within communication networks. To deal with the problem, various solutions are proposed with the hope of finding a better balance between efficiency and accuracy. Following this goal, we focus on investigating how to design a lightweight FL algorithm that requires less communication cost while maintaining comparable accuracy. Specifically, we propose a Sketch-based FL algorithm that combines the incremental singular value decomposition (ISVD) method in a way that does not negatively affect accuracy much in the training process. Moreover, we also provide adaptive gradient error accumulation and error compensation mechanisms to mitigate accumulated gradient errors caused by sketch compression and improve the model accuracy. Our extensive experimentation with various datasets demonstrates the efficacy of our proposed approach. Specifically, our scheme achieves nearly a 93% reduction in communication cost during the training of multi-layer perceptron models (MLP) using the MNIST dataset.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 1","pages":"170-184"},"PeriodicalIF":3.6,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}