{"title":"lvit.net:结合局部语义和多特征交叉融合的领域泛化人物再识别模型。","authors":"Xintong Hu, Peishun Liu, Xuefang Wang, Peiyao Wu, Ruichun Tang","doi":"10.1186/s42492-025-00190-1","DOIUrl":null,"url":null,"abstract":"<p><p>In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method-LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.</p>","PeriodicalId":29931,"journal":{"name":"Visual Computing for Industry Biomedicine and Art","volume":"8 1","pages":"10"},"PeriodicalIF":3.2000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003221/pdf/","citationCount":"0","resultStr":"{\"title\":\"LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion.\",\"authors\":\"Xintong Hu, Peishun Liu, Xuefang Wang, Peiyao Wu, Ruichun Tang\",\"doi\":\"10.1186/s42492-025-00190-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method-LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.</p>\",\"PeriodicalId\":29931,\"journal\":{\"name\":\"Visual Computing for Industry Biomedicine and Art\",\"volume\":\"8 1\",\"pages\":\"10\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-04-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12003221/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Visual Computing for Industry Biomedicine and Art\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1186/s42492-025-00190-1\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Visual Computing for Industry Biomedicine and Art","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1186/s42492-025-00190-1","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
LViT-Net: a domain generalization person re-identification model combining local semantics and multi-feature cross fusion.
In the task of domain generalization person re-identification (ReID), pedestrian image features exhibit significant intra-class variability and inter-class similarity. Existing methods rely on a single feature extraction architecture and struggle to capture both global context and local spatial information, resulting in weaker generalization to unseen domains. To address this issue, an innovative domain generalization person ReID method-LViT-Net, which combines local semantics and multi-feature cross fusion, is proposed. LViT-Net adopts a dual-branch encoder with a parallel hierarchical structure to extract both local and global discriminative features. In the local branch, the local multi-scale feature fusion module is designed to fuse local feature units at different scales to ensure that the fine-grained local features at various levels are accurately captured, thereby enhancing the robustness of the features. In the global branch, the dual feature cross fusion module fuses local features and global semantic information, focusing on critical semantic information and enabling the mutual refinement and matching of local and global features. This allows the model to achieve a dynamic balance between detailed and holistic information, forming robust feature representations of pedestrians. Extensive experiments demonstrate the effectiveness of LViT-Net. In both single-source and multi-source comparison experiments, the proposed method outperforms existing state-of-the-art methods.