{"title":"基于树结构增强轻量级网络的场景解析","authors":"Wenxin Huang, Wenxuan Liu, Xuemei Jia","doi":"10.1109/IEIR56323.2022.10050053","DOIUrl":null,"url":null,"abstract":"Scene parsing is a hot topic in the field of computer vision communities. It has extensive applications in visual perception e.g. education system, human-object robots, etc. However, there exists a huge size difference among objects in the scene image because of the diversity of objects and the influence of observation distance and other factors. How to better solve the varying scale problem has become a challenging problem in scene parsing. Thus, a tree-structure is proposed to handle the varying scale problem, where the feature maps of different levels are gradually nested and connected, which strengthens the connection between multiple feature maps, and captures more representative information. For real-time, we propose a framework named tree structure enhancement lightweight network (TSELight), which introduces the depth-wise separable dilated convolution (DSDC) into the tree structure and decomposes the middle nodes in the tree structure along the channel direction, thus improving the efficiency. Experimental results demonstrate that our TSELight architecture outperforms state-of-the-art methods on Cityscapes dataset, and provides consistent improvements on the real-time scene parsing performance.","PeriodicalId":183709,"journal":{"name":"2022 International Conference on Intelligent Education and Intelligent Research (IEIR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Scene Parsing via Tree Structure Enhancement Lightweight Network\",\"authors\":\"Wenxin Huang, Wenxuan Liu, Xuemei Jia\",\"doi\":\"10.1109/IEIR56323.2022.10050053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scene parsing is a hot topic in the field of computer vision communities. It has extensive applications in visual perception e.g. education system, human-object robots, etc. However, there exists a huge size difference among objects in the scene image because of the diversity of objects and the influence of observation distance and other factors. How to better solve the varying scale problem has become a challenging problem in scene parsing. Thus, a tree-structure is proposed to handle the varying scale problem, where the feature maps of different levels are gradually nested and connected, which strengthens the connection between multiple feature maps, and captures more representative information. For real-time, we propose a framework named tree structure enhancement lightweight network (TSELight), which introduces the depth-wise separable dilated convolution (DSDC) into the tree structure and decomposes the middle nodes in the tree structure along the channel direction, thus improving the efficiency. Experimental results demonstrate that our TSELight architecture outperforms state-of-the-art methods on Cityscapes dataset, and provides consistent improvements on the real-time scene parsing performance.\",\"PeriodicalId\":183709,\"journal\":{\"name\":\"2022 International Conference on Intelligent Education and Intelligent Research (IEIR)\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Intelligent Education and Intelligent Research (IEIR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEIR56323.2022.10050053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Intelligent Education and Intelligent Research (IEIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEIR56323.2022.10050053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scene Parsing via Tree Structure Enhancement Lightweight Network
Scene parsing is a hot topic in the field of computer vision communities. It has extensive applications in visual perception e.g. education system, human-object robots, etc. However, there exists a huge size difference among objects in the scene image because of the diversity of objects and the influence of observation distance and other factors. How to better solve the varying scale problem has become a challenging problem in scene parsing. Thus, a tree-structure is proposed to handle the varying scale problem, where the feature maps of different levels are gradually nested and connected, which strengthens the connection between multiple feature maps, and captures more representative information. For real-time, we propose a framework named tree structure enhancement lightweight network (TSELight), which introduces the depth-wise separable dilated convolution (DSDC) into the tree structure and decomposes the middle nodes in the tree structure along the channel direction, thus improving the efficiency. Experimental results demonstrate that our TSELight architecture outperforms state-of-the-art methods on Cityscapes dataset, and provides consistent improvements on the real-time scene parsing performance.