FRISNET：一个融合频域和多电平特征的快速实时实例分割网络

IF 5.9 2区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Instrumentation and Measurement Pub Date : 2025-06-26 DOI:10.1109/TIM.2025.3583292

Ying Xie;Jingkai Shang;Ruixiang Deng;Xianlun Tang;Wuqiang Yang

{"title":"FRISNET：一个融合频域和多电平特征的快速实时实例分割网络","authors":"Ying Xie;Jingkai Shang;Ruixiang Deng;Xianlun Tang;Wuqiang Yang","doi":"10.1109/TIM.2025.3583292","DOIUrl":null,"url":null,"abstract":"It is challenging to obtain accurate location information of instances and segmentation masks, considering the intricacy and diversity of practical scenarios. This article presents a fast real-time instance segmentation network (FRISNET) by fusing the information from the frequency domain and space domain. Based on you only look at coefficients (YOLACT), which is the fastest instance segmentation method, the frequency domain representation is introduced into a convolutional neural network (CNN). By fast Fourier transform (FFT), features of different frequencies extracted from the frequency domain are fused with the characteristic map of the spatial domain. Accurate global location information and clear semantic information are obtained using CNN. To take advantage of the high-resolution information of target location and feature information from the bottom level, as well as the supervision information located at the entirely identical level, a brand fresh bottom-up feature fusion branch and skip connection at the same level are introduced based on the top-down feature pyramid network (FPN) feature fusion network, enabling the feature extraction network to possess diverse feature representation. The proposed instance segmentation model is trained on open standard datasets of PASCAL segmentation boundary detection (PASCAL SBD) and Microsoft Common Objects in Context (MS COCO). The results show that the proposed method improves instance segmentation accuracy. It achieves a mean average precision (mAP) of 34.5 at 31.77 frames/s (FPS) on MS COCO. This performance is 1.17% higher than YOLACT++ with the ResNet-50 architecture. The model’s speed also shows potential for use in motion planning and tactile sensors for robotic grasping tasks. This could further enhance execution efficiency and operational reliability.","PeriodicalId":13341,"journal":{"name":"IEEE Transactions on Instrumentation and Measurement","volume":"74 ","pages":"1-14"},"PeriodicalIF":5.9000,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FRISNET: A Fast Real-Time Instance Segmentation Network Fusing Frequency Domain and Multilevel Features\",\"authors\":\"Ying Xie;Jingkai Shang;Ruixiang Deng;Xianlun Tang;Wuqiang Yang\",\"doi\":\"10.1109/TIM.2025.3583292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is challenging to obtain accurate location information of instances and segmentation masks, considering the intricacy and diversity of practical scenarios. This article presents a fast real-time instance segmentation network (FRISNET) by fusing the information from the frequency domain and space domain. Based on you only look at coefficients (YOLACT), which is the fastest instance segmentation method, the frequency domain representation is introduced into a convolutional neural network (CNN). By fast Fourier transform (FFT), features of different frequencies extracted from the frequency domain are fused with the characteristic map of the spatial domain. Accurate global location information and clear semantic information are obtained using CNN. To take advantage of the high-resolution information of target location and feature information from the bottom level, as well as the supervision information located at the entirely identical level, a brand fresh bottom-up feature fusion branch and skip connection at the same level are introduced based on the top-down feature pyramid network (FPN) feature fusion network, enabling the feature extraction network to possess diverse feature representation. The proposed instance segmentation model is trained on open standard datasets of PASCAL segmentation boundary detection (PASCAL SBD) and Microsoft Common Objects in Context (MS COCO). The results show that the proposed method improves instance segmentation accuracy. It achieves a mean average precision (mAP) of 34.5 at 31.77 frames/s (FPS) on MS COCO. This performance is 1.17% higher than YOLACT++ with the ResNet-50 architecture. The model’s speed also shows potential for use in motion planning and tactile sensors for robotic grasping tasks. This could further enhance execution efficiency and operational reliability.\",\"PeriodicalId\":13341,\"journal\":{\"name\":\"IEEE Transactions on Instrumentation and Measurement\",\"volume\":\"74 \",\"pages\":\"1-14\"},\"PeriodicalIF\":5.9000,\"publicationDate\":\"2025-06-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Instrumentation and Measurement\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11053186/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Instrumentation and Measurement","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11053186/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

考虑到实际场景的复杂性和多样性，获取实例和分割掩码的准确位置信息是一项挑战。本文提出了一种融合频域和空间信息的快速实时实例分割网络（FRISNET）。基于你只看系数（YOLACT），这是最快的实例分割方法，将频域表示引入卷积神经网络（CNN）。通过快速傅里叶变换（FFT），将从频域提取的不同频率的特征与空间域的特征映射融合在一起。利用CNN获得准确的全球位置信息和清晰的语义信息。为了利用目标位置和底层特征信息的高分辨率信息，以及位于完全同一层的监管信息，在自顶向下的特征金字塔网络（FPN）特征融合网络的基础上，引入了全新的自底向上的特征融合分支和同一层的跳过连接，使特征提取网络具有多样化的特征表示。本文提出的实例分割模型在PASCAL分割边界检测（PASCAL SBD）和微软公共对象上下文（MS COCO）的开放标准数据集上进行训练。结果表明，该方法提高了实例分割的精度。它在MS COCO上以31.77帧/秒（FPS）的速度实现了平均精度（mAP） 34.5。该性能比使用ResNet-50架构的yolact++提高1.17%。该模型的速度也显示出在机器人抓取任务的运动规划和触觉传感器中使用的潜力。这可以进一步提高执行效率和操作可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FRISNET: A Fast Real-Time Instance Segmentation Network Fusing Frequency Domain and Multilevel Features

It is challenging to obtain accurate location information of instances and segmentation masks, considering the intricacy and diversity of practical scenarios. This article presents a fast real-time instance segmentation network (FRISNET) by fusing the information from the frequency domain and space domain. Based on you only look at coefficients (YOLACT), which is the fastest instance segmentation method, the frequency domain representation is introduced into a convolutional neural network (CNN). By fast Fourier transform (FFT), features of different frequencies extracted from the frequency domain are fused with the characteristic map of the spatial domain. Accurate global location information and clear semantic information are obtained using CNN. To take advantage of the high-resolution information of target location and feature information from the bottom level, as well as the supervision information located at the entirely identical level, a brand fresh bottom-up feature fusion branch and skip connection at the same level are introduced based on the top-down feature pyramid network (FPN) feature fusion network, enabling the feature extraction network to possess diverse feature representation. The proposed instance segmentation model is trained on open standard datasets of PASCAL segmentation boundary detection (PASCAL SBD) and Microsoft Common Objects in Context (MS COCO). The results show that the proposed method improves instance segmentation accuracy. It achieves a mean average precision (mAP) of 34.5 at 31.77 frames/s (FPS) on MS COCO. This performance is 1.17% higher than YOLACT++ with the ResNet-50 architecture. The model’s speed also shows potential for use in motion planning and tactile sensors for robotic grasping tasks. This could further enhance execution efficiency and operational reliability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Instrumentation and Measurement 工程技术-工程：电子与电气

CiteScore

9.00

自引率

23.20%

发文量

1294

审稿时长

3.9 months

期刊介绍： Papers are sought that address innovative solutions to the development and use of electrical and electronic instruments and equipment to measure, monitor and/or record physical phenomena for the purpose of advancing measurement science, methods, functionality and applications. The scope of these papers may encompass: (1) theory, methodology, and practice of measurement; (2) design, development and evaluation of instrumentation and measurement systems and components used in generating, acquiring, conditioning and processing signals; (3) analysis, representation, display, and preservation of the information obtained from a set of measurements; and (4) scientific and technical support to establishment and maintenance of technical standards in the field of Instrumentation and Measurement.