Cheng Pang , Dingzhou Xie , Yingjie Song , Rushi Lan
{"title":"TransBranch:用于细粒度识别的转换分支体系结构","authors":"Cheng Pang , Dingzhou Xie , Yingjie Song , Rushi Lan","doi":"10.1016/j.patrec.2025.05.017","DOIUrl":null,"url":null,"abstract":"<div><div>In this paper, we present a novel architecture noted asTransBranch for the challenging fine-grained visual categorization tasks. Distinguished from traditional models based on cross-layer feature fusion, the proposed architecture enhances classification accuracy by strategically integrating image features in a delicate way: features with different levels are generated in parallel, then assembled via a designed content-aware cross-level fusion mechanism, by which the multi-level features compensate each other and highlight the discriminative cues for visually similar subcategories. To this end, we devise an adaptive weighting mechanism, which dynamically adjusts the weights of features at different levels based on the difficulty of distinguishing subcategories and the semantics of image contents. This mechanism identifies discriminative features from cluttered backgrounds and guides the model to focus on the rare categories, improving the recognition while alleviating the long-tail distribution issue. Furthermore, a multi-scale patch embedding strategy has been devised to ensure the completeness of semantic image contents during feature learning. Experimental results show the proposed model outperforms current transformer-based architectures across benchmarked datasets for fine-grained visual categorization, especially in distinguishing categories with extremely similar features.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"196 ","pages":"Pages 274-280"},"PeriodicalIF":3.3000,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TransBranch: A transformer branch architecture for fine-grained recognition\",\"authors\":\"Cheng Pang , Dingzhou Xie , Yingjie Song , Rushi Lan\",\"doi\":\"10.1016/j.patrec.2025.05.017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this paper, we present a novel architecture noted asTransBranch for the challenging fine-grained visual categorization tasks. Distinguished from traditional models based on cross-layer feature fusion, the proposed architecture enhances classification accuracy by strategically integrating image features in a delicate way: features with different levels are generated in parallel, then assembled via a designed content-aware cross-level fusion mechanism, by which the multi-level features compensate each other and highlight the discriminative cues for visually similar subcategories. To this end, we devise an adaptive weighting mechanism, which dynamically adjusts the weights of features at different levels based on the difficulty of distinguishing subcategories and the semantics of image contents. This mechanism identifies discriminative features from cluttered backgrounds and guides the model to focus on the rare categories, improving the recognition while alleviating the long-tail distribution issue. Furthermore, a multi-scale patch embedding strategy has been devised to ensure the completeness of semantic image contents during feature learning. Experimental results show the proposed model outperforms current transformer-based architectures across benchmarked datasets for fine-grained visual categorization, especially in distinguishing categories with extremely similar features.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"196 \",\"pages\":\"Pages 274-280\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525002119\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002119","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
TransBranch: A transformer branch architecture for fine-grained recognition
In this paper, we present a novel architecture noted asTransBranch for the challenging fine-grained visual categorization tasks. Distinguished from traditional models based on cross-layer feature fusion, the proposed architecture enhances classification accuracy by strategically integrating image features in a delicate way: features with different levels are generated in parallel, then assembled via a designed content-aware cross-level fusion mechanism, by which the multi-level features compensate each other and highlight the discriminative cues for visually similar subcategories. To this end, we devise an adaptive weighting mechanism, which dynamically adjusts the weights of features at different levels based on the difficulty of distinguishing subcategories and the semantics of image contents. This mechanism identifies discriminative features from cluttered backgrounds and guides the model to focus on the rare categories, improving the recognition while alleviating the long-tail distribution issue. Furthermore, a multi-scale patch embedding strategy has been devised to ensure the completeness of semantic image contents during feature learning. Experimental results show the proposed model outperforms current transformer-based architectures across benchmarked datasets for fine-grained visual categorization, especially in distinguishing categories with extremely similar features.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.