Jielin Jiang , Quan Zhang , Yan Cui , Shun Wei , Yingnan Zhao
{"title":"CAgMLP: An MLP-like architecture with a Cross-Axis gated token mixer for image classification","authors":"Jielin Jiang , Quan Zhang , Yan Cui , Shun Wei , Yingnan Zhao","doi":"10.1016/j.jvcir.2025.104590","DOIUrl":null,"url":null,"abstract":"<div><div>Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"112 ","pages":"Article 104590"},"PeriodicalIF":3.1000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320325002044","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Recent MLP-based models have employed axial projections to orthogonally decompose the entire space into horizontal and vertical directions, effectively balancing long-range dependencies and computational costs. However, such methods operate independently along the two axes, hindering their ability to capture the image’s global spatial structure. In this paper, we propose a novel MLP architecture called Cross-Axis gated MLP (CAgMLP), which consists of two main modules, Cross-Axis Gated Token-Mixing MLP (CGTM) and Convolutional Gated Channel-Mixing MLP (CGCM). CGTM addresses the loss of information from single-dimensional interactions by leveraging a multiplicative gating mechanism that facilitates the cross-fusion of features captured along the two spatial axes, enhancing feature selection and information flow. CGCM improves the dual-branch structure of the multiplicative gating units by projecting the fused low-dimensional input into two high-dimensional feature spaces and introducing non-linear features through element-wise multiplication, further improving the model’s expressive ability. Finally, both modules incorporate local token aggregation to compensate for the lack of local inductive bias in traditional MLP models. Experiments conducted on several datasets demonstrate that CAgMLP achieves superior classification performance compared to other state-of-the-art methods, while exhibiting fewer parameters and lower computational complexity.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.