Zhifeng Xie , Rui Qiu , Qile He , Mengtian Li , Xin Tan
{"title":"通过伪回放生成的场景理解领域增量学习范式","authors":"Zhifeng Xie , Rui Qiu , Qile He , Mengtian Li , Xin Tan","doi":"10.1016/j.gmod.2025.101290","DOIUrl":null,"url":null,"abstract":"<div><div>Scene understanding is a computer vision task that involves grasping the pixel-level distribution of objects. Unlike most research focuses on single-scene models, we consider a more versatile proposal: domain-incremental learning for scene understanding. This allows us to adapt well-studied single-scene models into multi-scene models, reducing data requirements and ensuring model flexibility. However, domain-incremental learning that leverages correlations between scene domains has yet to be explored. To address this challenge, we propose a Domain-Incremental Learning Paradigm (D-ILP) for scene understanding, along with a new strategy of Pseudo-Replay Generation (PRG) that does not require manual labeling. Specifically, D-ILP leverages pre-trained single-scene models and incremental images for supervised training to acquire new knowledge from other scenes. As a pre-trained generation model, PRG can controllably generate pseudo-replays resembling source images from incremental images and text prompts. These pseudo-replays are utilized to minimize catastrophic forgetting in the original scene. We perform experiments with three publicly accessible models: Mask2Former, Segformer, and DeepLabv3+. With successfully transforming these single-scene models into multi-scene models, we achieve high-quality parsing results for original and new scenes simultaneously. Meanwhile, the validity and rationality of our method are proved by the analysis of D-ILP.</div></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"141 ","pages":"Article 101290"},"PeriodicalIF":2.2000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Domain-Incremental Learning Paradigm for scene understanding via Pseudo-Replay Generation\",\"authors\":\"Zhifeng Xie , Rui Qiu , Qile He , Mengtian Li , Xin Tan\",\"doi\":\"10.1016/j.gmod.2025.101290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Scene understanding is a computer vision task that involves grasping the pixel-level distribution of objects. Unlike most research focuses on single-scene models, we consider a more versatile proposal: domain-incremental learning for scene understanding. This allows us to adapt well-studied single-scene models into multi-scene models, reducing data requirements and ensuring model flexibility. However, domain-incremental learning that leverages correlations between scene domains has yet to be explored. To address this challenge, we propose a Domain-Incremental Learning Paradigm (D-ILP) for scene understanding, along with a new strategy of Pseudo-Replay Generation (PRG) that does not require manual labeling. Specifically, D-ILP leverages pre-trained single-scene models and incremental images for supervised training to acquire new knowledge from other scenes. As a pre-trained generation model, PRG can controllably generate pseudo-replays resembling source images from incremental images and text prompts. These pseudo-replays are utilized to minimize catastrophic forgetting in the original scene. We perform experiments with three publicly accessible models: Mask2Former, Segformer, and DeepLabv3+. With successfully transforming these single-scene models into multi-scene models, we achieve high-quality parsing results for original and new scenes simultaneously. Meanwhile, the validity and rationality of our method are proved by the analysis of D-ILP.</div></div>\",\"PeriodicalId\":55083,\"journal\":{\"name\":\"Graphical Models\",\"volume\":\"141 \",\"pages\":\"Article 101290\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Graphical Models\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1524070325000372\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070325000372","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Domain-Incremental Learning Paradigm for scene understanding via Pseudo-Replay Generation
Scene understanding is a computer vision task that involves grasping the pixel-level distribution of objects. Unlike most research focuses on single-scene models, we consider a more versatile proposal: domain-incremental learning for scene understanding. This allows us to adapt well-studied single-scene models into multi-scene models, reducing data requirements and ensuring model flexibility. However, domain-incremental learning that leverages correlations between scene domains has yet to be explored. To address this challenge, we propose a Domain-Incremental Learning Paradigm (D-ILP) for scene understanding, along with a new strategy of Pseudo-Replay Generation (PRG) that does not require manual labeling. Specifically, D-ILP leverages pre-trained single-scene models and incremental images for supervised training to acquire new knowledge from other scenes. As a pre-trained generation model, PRG can controllably generate pseudo-replays resembling source images from incremental images and text prompts. These pseudo-replays are utilized to minimize catastrophic forgetting in the original scene. We perform experiments with three publicly accessible models: Mask2Former, Segformer, and DeepLabv3+. With successfully transforming these single-scene models into multi-scene models, we achieve high-quality parsing results for original and new scenes simultaneously. Meanwhile, the validity and rationality of our method are proved by the analysis of D-ILP.
期刊介绍:
Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics.
We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way).
GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.