CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-09-25 DOI:10.1016/j.jvcir.2025.104590

Jielin Jiang , Quan Zhang , Yan Cui , Shun Wei , Yingnan Zhao

{"title":"CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification","authors":"Jielin Jiang , Quan Zhang , Yan Cui , Shun Wei , Yingnan Zhao","doi":"10.1016/j.jvcir.2025.104590","DOIUrl":null,"url":null,"abstract":"<div><div>Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104590"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325002044","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.

查看原文本刊更多论文

CAgMLP：类似mlp的体系结构，具有用于图像分类的交叉轴门控令牌混频器

最近基于mlp的模型采用轴向投影将整个空间正交分解为水平和垂直方向，有效地平衡了远程依赖关系和计算成本。然而，这些方法沿着两个轴独立操作，阻碍了它们捕捉图像全局空间结构的能力。在本文中，我们提出了一种新的MLP架构，称为交叉轴门控MLP (CAgMLP)，它由两个主要模块组成，交叉轴门控令牌混合MLP （CGTM）和卷积门控信道混合MLP （CGCM）。CGTM通过利用乘法门控机制解决了单维交互中的信息丢失问题，该机制促进了沿两个空间轴捕获的特征的交叉融合，增强了特征选择和信息流。CGCM通过将融合的低维输入投影到两个高维特征空间中，并通过元素乘法引入非线性特征，改进了乘法门控单元的双分支结构，进一步提高了模型的表达能力。最后，两个模块都结合了本地令牌聚合，以弥补传统MLP模型中缺乏本地归纳偏差。在多个数据集上进行的实验表明，与其他最先进的方法相比，CAgMLP具有更好的分类性能，同时具有更少的参数和更低的计算复杂度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.