{"title":"显微曼巴仅用 4M 参数揭示显微图像的秘密","authors":"Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao","doi":"arxiv-2409.07896","DOIUrl":null,"url":null,"abstract":"In the field of medical microscopic image classification (MIC), CNN-based and\nTransformer-based models have been extensively studied. However, CNNs struggle\nwith modeling long-range dependencies, limiting their ability to fully utilize\nsemantic information in images. Conversely, Transformers are hampered by the\ncomplexity of quadratic computations. To address these challenges, we propose a\nmodel based on the Mamba architecture: Microscopic-Mamba. Specifically, we\ndesigned the Partially Selected Feed-Forward Network (PSFFN) to replace the\nlast linear layer of the Visual State Space Module (VSSM), enhancing Mamba's\nlocal feature extraction capabilities. Additionally, we introduced the\nModulation Interaction Feature Aggregation (MIFA) module to effectively\nmodulate and dynamically aggregate global and local features. We also\nincorporated a parallel VSSM mechanism to improve inter-channel information\ninteraction while reducing the number of parameters. Extensive experiments have\ndemonstrated that our method achieves state-of-the-art performance on five\npublic datasets. Code is available at\nhttps://github.com/zs1314/Microscopic-Mamba","PeriodicalId":501130,"journal":{"name":"arXiv - CS - Computer Vision and Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters\",\"authors\":\"Shun Zou, Zhuo Zhang, Yi Zou, Guangwei Gao\",\"doi\":\"arxiv-2409.07896\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of medical microscopic image classification (MIC), CNN-based and\\nTransformer-based models have been extensively studied. However, CNNs struggle\\nwith modeling long-range dependencies, limiting their ability to fully utilize\\nsemantic information in images. Conversely, Transformers are hampered by the\\ncomplexity of quadratic computations. To address these challenges, we propose a\\nmodel based on the Mamba architecture: Microscopic-Mamba. Specifically, we\\ndesigned the Partially Selected Feed-Forward Network (PSFFN) to replace the\\nlast linear layer of the Visual State Space Module (VSSM), enhancing Mamba's\\nlocal feature extraction capabilities. Additionally, we introduced the\\nModulation Interaction Feature Aggregation (MIFA) module to effectively\\nmodulate and dynamically aggregate global and local features. We also\\nincorporated a parallel VSSM mechanism to improve inter-channel information\\ninteraction while reducing the number of parameters. Extensive experiments have\\ndemonstrated that our method achieves state-of-the-art performance on five\\npublic datasets. Code is available at\\nhttps://github.com/zs1314/Microscopic-Mamba\",\"PeriodicalId\":501130,\"journal\":{\"name\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computer Vision and Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07896\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computer Vision and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Microscopic-Mamba: Revealing the Secrets of Microscopic Images with Just 4M Parameters
In the field of medical microscopic image classification (MIC), CNN-based and
Transformer-based models have been extensively studied. However, CNNs struggle
with modeling long-range dependencies, limiting their ability to fully utilize
semantic information in images. Conversely, Transformers are hampered by the
complexity of quadratic computations. To address these challenges, we propose a
model based on the Mamba architecture: Microscopic-Mamba. Specifically, we
designed the Partially Selected Feed-Forward Network (PSFFN) to replace the
last linear layer of the Visual State Space Module (VSSM), enhancing Mamba's
local feature extraction capabilities. Additionally, we introduced the
Modulation Interaction Feature Aggregation (MIFA) module to effectively
modulate and dynamically aggregate global and local features. We also
incorporated a parallel VSSM mechanism to improve inter-channel information
interaction while reducing the number of parameters. Extensive experiments have
demonstrated that our method achieves state-of-the-art performance on five
public datasets. Code is available at
https://github.com/zs1314/Microscopic-Mamba