Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang
{"title":"Instance segmentation of faces and mouth-opening degrees based on improved YOLOv8 method","authors":"Yuhe Fan, Lixun Zhang, Canxing Zheng, Xingyuan Wang, Jinghui Zhu, Lan Wang","doi":"10.1007/s00530-024-01472-z","DOIUrl":null,"url":null,"abstract":"<p>Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.</p>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01472-z","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Instance segmentation of faces and mouth-opening degrees is an important technology for meal-assisting robotics in food delivery safety. However, due to the diversity in in shape, color, and posture of faces and the mouth with small area contour, easy to deform, and occluded, it is challenging to real-time and accurate instance segmentation. In this paper, we proposed a novel method for instance segmentation of faces and mouth-opening degrees. Specifically, in backbone network, deformable convolution was introduced to enhance the ability to capture finer-grained spatial information and the CloFormer module was introduced to improve the ability to capture high-frequency local and low-frequency global information. In neck network, classical convolution and C2f modules are replaced by GSConv and VoV-GSCSP aggregation modules, respectively, to reduce the complexity and floating-point operations of models. Finally, in localization loss, CIOU loss was replaced by WIOU loss to reduce the competitiveness of high-quality anchor frames and mask the influence of low-quality samples, which in turn improves localization accuracy and generalization ability. It is abbreviated as the DCGW-YOLOv8n-seg model. The DCGW-YOLOv8n-seg model was compared with the baseline YOLOv8n-seg model and several state-of-the-art instance segmentation models on datasets, respectively. The results show that the DCGW-YOLOv8n-seg model is characterized by high accuracy, speed, robustness, and generalization ability. The effectiveness of each improvement in improving the model performance was verified by ablation experiments. Finally, the DCGW-YOLOv8n-seg model was applied to the instance segmentation experiment of meal-assisting robotics. The results show that the DCGW-YOLOv8n-seg model can better realize the instance segmentation effect of faces and mouth-opening degrees. The novel method proposed can provide a guiding theoretical basis for meal-assisting robotics in food delivery safety and can provide a reference value for computer vision and image instance segmentation.