Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park
{"title":"A More Accurate Approximation of Activation Function with Few Spikes Neurons","authors":"Dayena Jeong, Jaewoo Park, Jeonghee Jo, Jongkil Park, Jaewook Kim, Hyun Jae Jang, Suyoun Lee, Seongsik Park","doi":"arxiv-2409.00044","DOIUrl":"https://doi.org/arxiv-2409.00044","url":null,"abstract":"Recent deep neural networks (DNNs), such as diffusion models [1], have faced\u0000high computational demands. Thus, spiking neural networks (SNNs) have attracted\u0000lots of attention as energy-efficient neural networks. However, conventional\u0000spiking neurons, such as leaky integrate-and-fire neurons, cannot accurately\u0000represent complex non-linear activation functions, such as Swish [2]. To\u0000approximate activation functions with spiking neurons, few spikes (FS) neurons\u0000were proposed [3], but the approximation performance was limited due to the\u0000lack of training methods considering the neurons. Thus, we propose\u0000tendency-based parameter initialization (TBPI) to enhance the approximation of\u0000activation function with FS neurons, exploiting temporal dependencies\u0000initializing the training parameters.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu
{"title":"TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading","authors":"Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu","doi":"arxiv-2408.10013","DOIUrl":"https://doi.org/arxiv-2408.10013","url":null,"abstract":"The growth rate of the GPU memory capacity has not been able to keep up with\u0000that of the size of large language models (LLMs), hindering the model training\u0000process. In particular, activations -- the intermediate tensors produced during\u0000forward propagation and reused in backward propagation -- dominate the GPU\u0000memory use. To address this challenge, we propose TBA to efficiently offload\u0000activations to high-capacity NVMe SSDs. This approach reduces GPU memory usage\u0000without impacting performance by adaptively overlapping data transfers with\u0000computation. TBA is compatible with popular deep learning frameworks like\u0000PyTorch, Megatron, and DeepSpeed, and it employs techniques such as tensor\u0000deduplication, forwarding, and adaptive offloading to further enhance\u0000efficiency. We conduct extensive experiments on GPT, BERT, and T5. Results\u0000demonstrate that TBA effectively reduces 47% of the activation peak memory\u0000usage. At the same time, TBA perfectly overlaps the I/O with the computation\u0000and incurs negligible performance overhead. We introduce the\u0000recompute-offload-keep (ROK) curve to compare the TBA offloading with other two\u0000tensor placement strategies, keeping activations in memory and layerwise full\u0000recomputation. We find that TBA achieves better memory savings than layerwise\u0000full recomputation while retaining the performance of keeping the activations\u0000in memory.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mitigating the Stability-Plasticity Dilemma in Adaptive Train Scheduling with Curriculum-Driven Continual DQN Expansion","authors":"Achref Jaziri, Etienne Künzel, Visvanathan Ramesh","doi":"arxiv-2408.09838","DOIUrl":"https://doi.org/arxiv-2408.09838","url":null,"abstract":"A continual learning agent builds on previous experiences to develop\u0000increasingly complex behaviors by adapting to non-stationary and dynamic\u0000environments while preserving previously acquired knowledge. However, scaling\u0000these systems presents significant challenges, particularly in balancing the\u0000preservation of previous policies with the adaptation of new ones to current\u0000environments. This balance, known as the stability-plasticity dilemma, is\u0000especially pronounced in complex multi-agent domains such as the train\u0000scheduling problem, where environmental and agent behaviors are constantly\u0000changing, and the search space is vast. In this work, we propose addressing\u0000these challenges in the train scheduling problem using curriculum learning. We\u0000design a curriculum with adjacent skills that build on each other to improve\u0000generalization performance. Introducing a curriculum with distinct tasks\u0000introduces non-stationarity, which we address by proposing a new algorithm:\u0000Continual Deep Q-Network (DQN) Expansion (CDE). Our approach dynamically\u0000generates and adjusts Q-function subspaces to handle environmental changes and\u0000task requirements. CDE mitigates catastrophic forgetting through EWC while\u0000ensuring high plasticity using adaptive rational activation functions.\u0000Experimental results demonstrate significant improvements in learning\u0000efficiency and adaptability compared to RL baselines and other adapted methods\u0000for continual learning, highlighting the potential of our method in managing\u0000the stability-plasticity dilemma in the adaptive train scheduling setting.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"113 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian
{"title":"Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms","authors":"Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian","doi":"arxiv-2408.09764","DOIUrl":"https://doi.org/arxiv-2408.09764","url":null,"abstract":"Human Action Recognition (HAR) stands as a pivotal research domain in both\u0000computer vision and artificial intelligence, with RGB cameras dominating as the\u0000preferred tool for investigation and innovation in this field. However, in\u0000real-world applications, RGB cameras encounter numerous challenges, including\u0000light conditions, fast motion, and privacy concerns. Consequently, bio-inspired\u0000event cameras have garnered increasing attention due to their advantages of low\u0000energy consumption, high dynamic range, etc. Nevertheless, most existing\u0000event-based HAR datasets are low resolution ($346 times 260$). In this paper,\u0000we propose a large-scale, high-definition ($1280 times 800$) human action\u0000recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It\u0000encompasses 150 commonly occurring action categories, comprising a total of\u0000124,625 video sequences. Various factors such as multi-view, illumination,\u0000action speed, and occlusion are considered when recording these data. To build\u0000a more comprehensive benchmark dataset, we report over 20 mainstream HAR models\u0000for future works to compare. In addition, we also propose a novel Mamba vision\u0000backbone network for event stream based HAR, termed EVMamba, which equips the\u0000spatial plane multi-directional scanning and novel voxel temporal scanning\u0000mechanism. By encoding and mining the spatio-temporal information of event\u0000streams, our EVMamba has achieved favorable results across multiple datasets.\u0000Both the dataset and source code will be released on\u0000url{https://github.com/Event-AHU/CeleX-HAR}","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Population-based Search with Active Inference","authors":"Nassim Dehouche, Daniel Friedman","doi":"arxiv-2408.09548","DOIUrl":"https://doi.org/arxiv-2408.09548","url":null,"abstract":"The Active Inference framework models perception and action as a unified\u0000process, where agents use probabilistic models to predict and actively minimize\u0000sensory discrepancies. In complement and contrast, traditional population-based\u0000metaheuristics rely on reactive environmental interactions without anticipatory\u0000adaptation. This paper proposes the integration of Active Inference into these\u0000metaheuristics to enhance performance through anticipatory environmental\u0000adaptation. We demonstrate this approach specifically with Ant Colony\u0000Optimization (ACO) on the Travelling Salesman Problem (TSP). Experimental\u0000results indicate that Active Inference can yield some improved solutions with\u0000only a marginal increase in computational cost, with interesting patterns of\u0000performance that relate to number and topology of nodes in the graph. Further\u0000work will characterize where and when different types of Active Inference\u0000augmentation of population metaheuristics may be efficacious.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas
{"title":"On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization","authors":"Erik B. Terres-Escudero, Javier Del Ser, Pablo Garcia-Bringas","doi":"arxiv-2408.09210","DOIUrl":"https://doi.org/arxiv-2408.09210","url":null,"abstract":"Forward-only learning algorithms have recently gained attention as\u0000alternatives to gradient backpropagation, replacing the backward step of this\u0000latter solver with an additional contrastive forward pass. Among these\u0000approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to\u0000achieve competitive levels of performance in terms of generalization and\u0000complexity. Networks trained using FFA learn to contrastively maximize a\u0000layer-wise defined goodness score when presented with real data (denoted as\u0000positive samples) and to minimize it when processing synthetic data (corr.\u0000negative samples). However, this algorithm still faces weaknesses that\u0000negatively affect the model accuracy and training stability, primarily due to a\u0000gradient imbalance between positive and negative samples. To overcome this\u0000issue, in this work we propose a novel implementation of the FFA algorithm,\u0000denoted as Polar-FFA, which extends the original formulation by introducing a\u0000neural division (emph{polarization}) between positive and negative instances.\u0000Neurons in each of these groups aim to maximize their goodness when presented\u0000with their respective data type, thereby creating a symmetric gradient\u0000behavior. To empirically gauge the improved learning capabilities of our\u0000proposed Polar-FFA, we perform several systematic experiments using different\u0000activation and goodness functions over image classification datasets. Our\u0000results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and\u0000convergence speed. Furthermore, its lower reliance on hyperparameters reduces\u0000the need for hyperparameter tuning to guarantee optimal generalization\u0000capabilities, thereby allowing for a broader range of neural network\u0000configurations.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo
{"title":"A theoretical framework for reservoir computing on networks of organic electrochemical transistors","authors":"Nicholas W. Landry, Beckett R. Hyde, Jake C. Perez, Sean E. Shaheen, Juan G. Restrepo","doi":"arxiv-2408.09223","DOIUrl":"https://doi.org/arxiv-2408.09223","url":null,"abstract":"Efficient and accurate prediction of physical systems is important even when\u0000the rules of those systems cannot be easily learned. Reservoir computing, a\u0000type of recurrent neural network with fixed nonlinear units, is one such\u0000prediction method and is valued for its ease of training. Organic\u0000electrochemical transistors (OECTs) are physical devices with nonlinear\u0000transient properties that can be used as the nonlinear units of a reservoir\u0000computer. We present a theoretical framework for simulating reservoir computers\u0000using OECTs as the non-linear units as a test bed for designing physical\u0000reservoir computers. We present a proof of concept demonstrating that such an\u0000implementation can accurately predict the Lorenz attractor with comparable\u0000performance to standard reservoir computer implementations. We explore the\u0000effect of operating parameters and find that the prediction performance\u0000strongly depends on the pinch-off voltage of the OECTs.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward End-to-End Bearing Fault Diagnosis for Industrial Scenarios with Spiking Neural Networks","authors":"Yongqi Ding, Lin Zuo, Mengmeng Jing, Kunshan Yang, Biao Chen, Yunqian Yu","doi":"arxiv-2408.11067","DOIUrl":"https://doi.org/arxiv-2408.11067","url":null,"abstract":"Spiking neural networks (SNNs) transmit information via low-power binary\u0000spikes and have received widespread attention in areas such as computer vision\u0000and reinforcement learning. However, there have been very few explorations of\u0000SNNs in more practical industrial scenarios. In this paper, we focus on the\u0000application of SNNs in bearing fault diagnosis to facilitate the integration of\u0000high-performance AI algorithms and real-world industries. In particular, we\u0000identify two key limitations of existing SNN fault diagnosis methods:\u0000inadequate encoding capacity that necessitates cumbersome data preprocessing,\u0000and non-spike-oriented architectures that constrain the performance of SNNs. To\u0000alleviate these problems, we propose a Multi-scale Residual Attention SNN\u0000(MRA-SNN) to simultaneously improve the efficiency, performance, and robustness\u0000of SNN methods. By incorporating a lightweight attention mechanism, we have\u0000designed a multi-scale attention encoding module to extract multiscale fault\u0000features from vibration signals and encode them as spatio-temporal spikes,\u0000eliminating the need for complicated preprocessing. Then, the spike residual\u0000attention block extracts high-dimensional fault features and enhances the\u0000expressiveness of sparse spikes with the attention mechanism for end-to-end\u0000diagnosis. In addition, the performance and robustness of MRA-SNN is further\u0000enhanced by introducing the lightweight attention mechanism within the spiking\u0000neurons to simulate the biological dendritic filtering effect. Extensive\u0000experiments on MFPT and JNU benchmark datasets demonstrate that MRA-SNN\u0000significantly outperforms existing methods in terms of accuracy, energy\u0000consumption and noise robustness, and is more feasible for deployment in\u0000real-world industrial scenarios.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi
{"title":"TACOS: Task Agnostic Continual Learning in Spiking Neural Networks","authors":"Nicholas Soures, Peter Helfer, Anurag Daram, Tej Pandit, Dhireesha Kudithipudi","doi":"arxiv-2409.00021","DOIUrl":"https://doi.org/arxiv-2409.00021","url":null,"abstract":"Catastrophic interference, the loss of previously learned information when\u0000learning new information, remains a major challenge in machine learning. Since\u0000living organisms do not seem to suffer from this problem, researchers have\u0000taken inspiration from biology to improve memory retention in artificial\u0000intelligence systems. However, previous attempts to use bio-inspired mechanisms\u0000have typically resulted in systems that rely on task boundary information\u0000during training and/or explicit task identification during inference,\u0000information that is not available in real-world scenarios. Here, we show that\u0000neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can\u0000mitigate catastrophic interference in a spiking neural network, using only\u0000synapse-local information, with no need for task awareness, and with a fixed\u0000memory size that does not need to be increased when training on new tasks. Our\u0000model, TACOS, combines neuromodulation with complex synaptic dynamics to enable\u0000new learning while protecting previous information. We evaluate TACOS on\u0000sequential image recognition tasks and demonstrate its effectiveness in\u0000reducing catastrophic interference. Our results show that TACOS outperforms\u0000existing regularization techniques in domain-incremental learning scenarios. We\u0000also report the results of an ablation study to elucidate the contribution of\u0000each neuro-inspired mechanism separately.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany
{"title":"$EvoAl^{2048}$","authors":"Bernhard J. BergerUniversity of Rostock, Software Engineering Chair Rostock, GermanyHamburg University of Technology, Institute of Embedded Systems, Germany, Christina PlumpDFKI - Cyber-Physical Systems Bremen, Germany, Rolf DrechslerUniversity of Bremen, Departments of Mathematics and Computer ScienceDFKI - Cyber-Physical Systems Bremen, Germany","doi":"arxiv-2408.16780","DOIUrl":"https://doi.org/arxiv-2408.16780","url":null,"abstract":"As AI solutions enter safety-critical products, the explainability and\u0000interpretability of solutions generated by AI products become increasingly\u0000important. In the long term, such explanations are the key to gaining users'\u0000acceptance of AI-based systems' decisions. We report on applying a\u0000model-driven-based optimisation to search for an interpretable and explainable\u0000policy that solves the game 2048. This paper describes a solution to the\u0000GECCO'24 Interpretable Control Competition using the open-source software\u0000EvoAl. We aimed to develop an approach for creating interpretable policies that\u0000are easy to adapt to new ideas.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142188414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}