arXiv - CS - Neural and Evolutionary Computing最新文献_第7页

A More Accurate Approximation of Activation Function with Few Spikes Neurons 用少量尖峰神经元更精确地逼近激活函数

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-19 DOI: arxiv-2409.00044

Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park

引用次数: 0

TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading TBA：使用基于固态盘的激活卸载加快大型语言模型训练

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-19 DOI: arxiv-2408.10013

Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu

{"title":"TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading","authors":"Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu","doi":"arxiv-2408.10013","DOIUrl":"https://doi.org/arxiv-2408.10013","url":null,"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with\u0000that of the size of large language models (LLMs), hindering the model training\u0000process. In particular, activations -- the intermediate tensors produced during\u0000forward propagation and reused in backward propagation -- dominate the GPU\u0000memory use. To address this challenge, we propose TBA to efficiently offload\u0000activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage\u0000without impacting performance by adaptively overlapping data transfers with\u0000computation. TBA is compatible with popular deep learning frameworks like\u0000PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor\u0000deduplication, forwarding, and adaptive offloading to further enhance\u0000efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results\u0000demonstrate that TBA effectively reduces 47% of the activation peak memory\u0000usage. At the same time, TBA perfectly overlaps the I/O with the computation\u0000and incurs negligible performance overhead. We introduce the\u0000recompute-offload-keep (ROK) curve to compare the TBA offloading with other two\u0000tensor placement strategies, keeping activations in memory and layerwise full\u0000recomputation. We find that TBA achieves better memory savings than layerwise\u0000full recomputation while retaining the performance of keeping the activations\u0000in memory.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion 用课程驱动的连续 DQN 扩展缓解自适应列车调度中的稳定性-弹性困境

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-19 DOI: arxiv-2408.09838

Achref Jaziri, Etienne Künzel, Visvanathan Ramesh

{"title":"Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion","authors":"Achref Jaziri, Etienne Künzel, Visvanathan Ramesh","doi":"arxiv-2408.09838","DOIUrl":"https://doi.org/arxiv-2408.09838","url":null,"abstract":"A continual learning agent builds on previous experiences to develop\u0000increasingly complex behaviors by adapting to non-stationary and dynamic\u0000environments while preserving previously acquired knowledge. However, scaling\u0000these systems presents significant challenges, particularly in balancing the\u0000preservation of previous policies with the adaptation of new ones to current\u0000environments. This balance, known as the stability-plasticity dilemma, is\u0000especially pronounced in complex multi-agent domains such as the train\u0000scheduling problem, where environmental and agent behaviors are constantly\u0000changing, and the search space is vast. In this work, we propose addressing\u0000these challenges in the train scheduling problem using curriculum learning. We\u0000design a curriculum with adjacent skills that build on each other to improve\u0000generalization performance. Introducing a curriculum with distinct tasks\u0000introduces non-stationarity, which we address by proposing a new algorithm:\u0000Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically\u0000generates and adjusts Q-function subspaces to handle environmental changes and\u0000task requirements. CDE mitigates catastrophic forgetting through EWC while\u0000ensuring high plasticity using adaptive rational activation functions.\u0000Experimental results demonstrate significant improvements in learning\u0000efficiency and adaptability compared to RL baselines and other adapted methods\u0000for continual learning, highlighting the potential of our method in managing\u0000the stability-plasticity dilemma in the adaptive train scheduling setting.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms 基于事件流的人类动作识别：高清基准数据集与算法

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-19 DOI: arxiv-2408.09764

Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian

{"title":"Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms","authors":"Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian","doi":"arxiv-2408.09764","DOIUrl":"https://doi.org/arxiv-2408.09764","url":null,"abstract":"Human Action Recognition (HAR) stands as a pivotal research domain in both\u0000computer vision and artificial intelligence, with RGB cameras dominating as the\u0000preferred tool for investigation and innovation in this field. However, in\u0000real-world applications, RGB cameras encounter numerous challenges, including\u0000light conditions, fast motion, and privacy concerns. Consequently, bio-inspired\u0000event cameras have garnered increasing attention due to their advantages of low\u0000energy consumption, high dynamic range, etc. Nevertheless, most existing\u0000event-based HAR datasets are low resolution ($346 times 260$). In this paper,\u0000we propose a large-scale, high-definition ($1280 times 800$) human action\u0000recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It\u0000encompasses 150 commonly occurring action categories, comprising a total of\u0000124,625 video sequences. Various factors such as multi-view, illumination,\u0000action speed, and occlusion are considered when recording these data. To build\u0000a more comprehensive benchmark dataset, we report over 20 mainstream HAR models\u0000for future works to compare. In addition, we also propose a novel Mamba vision\u0000backbone network for event stream based HAR, termed EVMamba, which equips the\u0000spatial plane multi-directional scanning and novel voxel temporal scanning\u0000mechanism. By encoding and mining the spatio-temporal information of event\u0000streams, our EVMamba has achieved favorable results across multiple datasets.\u0000Both the dataset and source code will be released on\u0000url{https://github.com/Event-AHU/CeleX-HAR}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Population-based Search with Active Inference 用主动推理增强基于种群的搜索

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-18 DOI: arxiv-2408.09548

Nassim Dehouche, Daniel Friedman

引用次数: 0

On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization 论通过神经极化提高前向学习的泛化和稳定性

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-17 DOI: arxiv-2408.09210

Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas

{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":"https://doi.org/arxiv-2408.09210","url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\u0000alternatives to gradient backpropagation, replacing the backward step of this\u0000latter solver with an additional contrastive forward pass. Among these\u0000approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\u0000achieve competitive levels of performance in terms of generalization and\u0000complexity. Networks trained using FFA learn to contrastively maximize a\u0000layer-wise defined goodness score when presented with real data (denoted as\u0000positive samples) and to minimize it when processing synthetic data (corr.\u0000negative samples). However, this algorithm still faces weaknesses that\u0000negatively affect the model accuracy and training stability, primarily due to a\u0000gradient imbalance between positive and negative samples. To overcome this\u0000issue, in this work we propose a novel implementation of the FFA algorithm,\u0000denoted as Polar-FFA, which extends the original formulation by introducing a\u0000neural division (emph{polarization}) between positive and negative instances.\u0000Neurons in each of these groups aim to maximize their goodness when presented\u0000with their respective data type, thereby creating a symmetric gradient\u0000behavior. To empirically gauge the improved learning capabilities of our\u0000proposed Polar-FFA, we perform several systematic experiments using different\u0000activation and goodness functions over image classification datasets. Our\u0000results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\u0000convergence speed. Furthermore, its lower reliance on hyperparameters reduces\u0000the need for hyperparameter tuning to guarantee optimal generalization\u0000capabilities, thereby allowing for a broader range of neural network\u0000configurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A theoretical framework for reservoir computing on networks of organic electrochemical transistors 有机电化学晶体管网络存储计算的理论框架

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-17 DOI: arxiv-2408.09223

Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo

引用次数: 0

Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks 利用尖峰神经网络实现面向工业场景的端到端轴承故障诊断

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-17 DOI: arxiv-2408.11067

Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu

{"title":"Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks","authors":"Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu","doi":"arxiv-2408.11067","DOIUrl":"https://doi.org/arxiv-2408.11067","url":null,"abstract":"Spiking neural networks (SNNs) transmit information via low-power binary\u0000spikes and have received widespread attention in areas such as computer vision\u0000and reinforcement learning. However, there have been very few explorations of\u0000SNNs in more practical industrial scenarios. In this paper, we focus on the\u0000application of SNNs in bearing fault diagnosis to facilitate the integration of\u0000high-performance AI algorithms and real-world industries. In particular, we\u0000identify two key limitations of existing SNN fault diagnosis methods:\u0000inadequate encoding capacity that necessitates cumbersome data preprocessing,\u0000and non-spike-oriented architectures that constrain the performance of SNNs. To\u0000alleviate these problems, we propose a Multi-scale Residual Attention SNN\u0000(MRA-SNN) to simultaneously improve the efficiency, performance, and robustness\u0000of SNN methods. By incorporating a lightweight attention mechanism, we have\u0000designed a multi-scale attention encoding module to extract multiscale fault\u0000features from vibration signals and encode them as spatio-temporal spikes,\u0000eliminating the need for complicated preprocessing. Then, the spike residual\u0000attention block extracts high-dimensional fault features and enhances the\u0000expressiveness of sparse spikes with the attention mechanism for end-to-end\u0000diagnosis. In addition, the performance and robustness of MRA-SNN is further\u0000enhanced by introducing the lightweight attention mechanism within the spiking\u0000neurons to simulate the biological dendritic filtering effect. Extensive\u0000experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN\u0000significantly outperforms existing methods in terms of accuracy, energy\u0000consumption and noise robustness, and is more feasible for deployment in\u0000real-world industrial scenarios.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TACOS: Task Agnostic Continual Learning in Spiking Neural Networks TACOS：尖峰神经网络中与任务无关的持续学习

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-16 DOI: arxiv-2409.00021

Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi

{"title":"TACOS: Task Agnostic Continual Learning in Spiking Neural Networks","authors":"Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi","doi":"arxiv-2409.00021","DOIUrl":"https://doi.org/arxiv-2409.00021","url":null,"abstract":"Catastrophic interference, the loss of previously learned information when\u0000learning new information, remains a major challenge in machine learning. Since\u0000living organisms do not seem to suffer from this problem, researchers have\u0000taken inspiration from biology to improve memory retention in artificial\u0000intelligence systems. However, previous attempts to use bio-inspired mechanisms\u0000have typically resulted in systems that rely on task boundary information\u0000during training and/or explicit task identification during inference,\u0000information that is not available in real-world scenarios. Here, we show that\u0000neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can\u0000mitigate catastrophic interference in a spiking neural network, using only\u0000synapse-local information, with no need for task awareness, and with a fixed\u0000memory size that does not need to be increased when training on new tasks. Our\u0000model, TACOS, combines neuromodulation with complex synaptic dynamics to enable\u0000new learning while protecting previous information. We evaluate TACOS on\u0000sequential image recognition tasks and demonstrate its effectiveness in\u0000reducing catastrophic interference. Our results show that TACOS outperforms\u0000existing regularization techniques in domain-incremental learning scenarios. We\u0000also report the results of an ablation study to elucidate the contribution of\u0000each neuro-inspired mechanism separately.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

$EvoAl^{2048}$ $EvoAl^{2048}$

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-15 DOI: arxiv-2408.16780

Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany

引用次数: 0