{"title":"Discovering Long-Term Effects on Parameter Efficient Fine-tuning","authors":"Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang","doi":"arxiv-2409.06706","DOIUrl":null,"url":null,"abstract":"Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern\nrecognition capabilities and share extensive similarities with the human brain,\nspecifically Biological Neural Networks (BNNs). We are particularly intrigued\nby these models' ability to acquire new knowledge through fine-tuning. In this\nregard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption\nas a substitute for full fine-tuning due to its cost reduction in training and\nmitigation of over-fitting risks by limiting the number of trainable parameters\nduring adaptation. Since both ANNs and BNNs propagate information\nlayer-by-layer, a common analogy can be drawn: weights in ANNs represent\nsynapses in BNNs, while features (also known as latent variables or logits) in\nANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT\nmethods aim to adjust feature or parameter values using only a limited number\nof trainable parameters (usually less than 1% of the total parameters), yet\nachieve surprisingly good results. Building upon this clue, we delve deeper\ninto exploring the connections between feature adjustment and parameter\nadjustment, resulting in our proposed method Synapses & Neurons (SAN) that\nlearns scaling matrices for features and propagates their effects towards\nposterior weight matrices. Our approach draws strong inspiration from\nwell-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term\nDepression (LTD), which also reveal the relationship between synapse\ndevelopment and neurotransmitter release levels. We conducted extensive\ncomparisons of PEFT on 26 datasets using attention-based networks as well as\nconvolution-based networks, leading to significant improvements compared to\nother tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning,\nand +3.2% over LoRA). The codes would be released.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.06706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern
recognition capabilities and share extensive similarities with the human brain,
specifically Biological Neural Networks (BNNs). We are particularly intrigued
by these models' ability to acquire new knowledge through fine-tuning. In this
regard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption
as a substitute for full fine-tuning due to its cost reduction in training and
mitigation of over-fitting risks by limiting the number of trainable parameters
during adaptation. Since both ANNs and BNNs propagate information
layer-by-layer, a common analogy can be drawn: weights in ANNs represent
synapses in BNNs, while features (also known as latent variables or logits) in
ANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT
methods aim to adjust feature or parameter values using only a limited number
of trainable parameters (usually less than 1% of the total parameters), yet
achieve surprisingly good results. Building upon this clue, we delve deeper
into exploring the connections between feature adjustment and parameter
adjustment, resulting in our proposed method Synapses & Neurons (SAN) that
learns scaling matrices for features and propagates their effects towards
posterior weight matrices. Our approach draws strong inspiration from
well-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term
Depression (LTD), which also reveal the relationship between synapse
development and neurotransmitter release levels. We conducted extensive
comparisons of PEFT on 26 datasets using attention-based networks as well as
convolution-based networks, leading to significant improvements compared to
other tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning,
and +3.2% over LoRA). The codes would be released.