A min–max optimization framework for sparse multi-task deep neural network

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-07-07 DOI:10.1016/j.neucom.2025.130865

Jiacheng Guo , Lei Li , Huiming Sun , Minghai Qin , Hongkai Yu , Tianyun Zhang

{"title":"A min–max optimization framework for sparse multi-task deep neural network","authors":"Jiacheng Guo , Lei Li , Huiming Sun , Minghai Qin , Hongkai Yu , Tianyun Zhang","doi":"10.1016/j.neucom.2025.130865","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-task learning is a subfield of machine learning in which the data is trained with a shared model to solve different tasks simultaneously. Instead of training multiple models, we only need to train a single model with shared parameters to solve different tasks. By sharing parameters, multi-task learning significantly decreases the number of parameters and reduces computational and storage requirements. However, when applying multi-task learning to deep neural networks, model size remains a challenge, particularly for edge platforms. Compressing multi-task models while maintaining performance across all tasks is another significant challenge. To address these issues, we propose a min–max optimization framework for highly compressed multi-task deep neural network models, combined with weight pruning or dynamic sparse training strategies to improve training efficiency by reducing model parameters. Specifically, weight pruning leverages reweighted <span><math><msub><mrow><mi>l</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> pruning method, enabling high pruning rates while preserving the performance across all tasks. Dynamic sparse training, on the other hand, initializes and updates the sparse masks of the network dynamically during the training process while maintaining the same number of weights, which typically encourages sparsity in the weight matrices with the advantage of reducing memory footprint and computational requirements. Our proposed min–max optimization framework can automatically adjust the learnable weighting factors between different tasks, ensuring optimization for the worst-performing task. Experimental results on NYUv2 and CIFAR-100 datasets demonstrate that the model incurs minor performance degradation after pruning with the min–max framework. Further analyses indicate the min–max framework has reliable performance and the difference from prior methods is statistically significant. The proposed dynamic sparse multi-task framework reaches around 2% overall precision improvement using min–max optimization compared with prior methods when the models are equally sparsed.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"650 ","pages":"Article 130865"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225015371","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-task learning is a subfield of machine learning in which the data is trained with a shared model to solve different tasks simultaneously. Instead of training multiple models, we only need to train a single model with shared parameters to solve different tasks. By sharing parameters, multi-task learning significantly decreases the number of parameters and reduces computational and storage requirements. However, when applying multi-task learning to deep neural networks, model size remains a challenge, particularly for edge platforms. Compressing multi-task models while maintaining performance across all tasks is another significant challenge. To address these issues, we propose a min–max optimization framework for highly compressed multi-task deep neural network models, combined with weight pruning or dynamic sparse training strategies to improve training efficiency by reducing model parameters. Specifically, weight pruning leverages reweighted

l_{1}

pruning method, enabling high pruning rates while preserving the performance across all tasks. Dynamic sparse training, on the other hand, initializes and updates the sparse masks of the network dynamically during the training process while maintaining the same number of weights, which typically encourages sparsity in the weight matrices with the advantage of reducing memory footprint and computational requirements. Our proposed min–max optimization framework can automatically adjust the learnable weighting factors between different tasks, ensuring optimization for the worst-performing task. Experimental results on NYUv2 and CIFAR-100 datasets demonstrate that the model incurs minor performance degradation after pruning with the min–max framework. Further analyses indicate the min–max framework has reliable performance and the difference from prior methods is statistically significant. The proposed dynamic sparse multi-task framework reaches around 2% overall precision improvement using min–max optimization compared with prior methods when the models are equally sparsed.

查看原文本刊更多论文

稀疏多任务深度神经网络的最小-最大优化框架

多任务学习是机器学习的一个子领域，其中数据使用共享模型进行训练，以同时解决不同的任务。我们只需要训练一个具有共享参数的单一模型来解决不同的任务，而不是训练多个模型。通过共享参数，多任务学习显著减少了参数的数量，减少了计算和存储需求。然而，当将多任务学习应用于深度神经网络时，模型大小仍然是一个挑战，特别是对于边缘平台。压缩多任务模型，同时保持跨所有任务的性能是另一个重大挑战。为了解决这些问题，我们提出了一个高度压缩的多任务深度神经网络模型的最小-最大优化框架，结合权值修剪或动态稀疏训练策略，通过减少模型参数来提高训练效率。具体来说，权重修剪利用重新加权的l1修剪方法，在保持所有任务性能的同时实现高修剪率。另一方面，动态稀疏训练在训练过程中动态初始化和更新网络的稀疏掩码，同时保持相同数量的权值，这通常鼓励权值矩阵的稀疏性，具有减少内存占用和计算需求的优点。我们提出的最小-最大优化框架可以自动调整不同任务之间的可学习权重因子，确保对表现最差的任务进行优化。在NYUv2和CIFAR-100数据集上的实验结果表明，使用最小-最大框架进行剪枝后，模型的性能下降较小。进一步分析表明，最小-最大框架具有可靠的性能，与先前方法的差异具有统计学意义。在模型同等稀疏的情况下，本文提出的动态稀疏多任务框架采用最小-最大优化方法，与之前的方法相比，总体精度提高了2%左右。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.