Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang
{"title":"Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang","doi":"10.1007/s00530-024-01472-z","DOIUrl":null,"url":null,"abstract":"<p>Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"11 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01472-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.