MI-DETR: A small object detection model for mixed scenes

IF 3.7 2区 工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong
{"title":"MI-DETR: A small object detection model for mixed scenes","authors":"Bo Peng ,&nbsp;Shidong Xiong ,&nbsp;Yangjian Wang ,&nbsp;Tingting Zhou ,&nbsp;Jinlan Li ,&nbsp;Hanguang Xiao ,&nbsp;Rong Xiong","doi":"10.1016/j.displa.2025.103052","DOIUrl":null,"url":null,"abstract":"<div><div>Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103052"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000897","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.
MI-DETR:用于混合场景的小型目标检测模型
目标检测是计算机视觉中的一项基本任务,有着广泛的应用。然而,现实世界场景的复杂性和物体的小尺寸给在混合环境中实现快速准确的检测带来了重大挑战。为了解决这些问题,我们提出了一种新的面向场景的混合小目标检测模型(MI-DETR),旨在提高检测精度和效率。MI-DETR骨干网结合了快速傅里叶变换、信道Shuffle和正交注意机制,在显著降低计算成本的同时提高了特征提取。此外,我们引入了专门的小目标特征层和多尺度特征融合(MSFF)模块来增强模型的特征融合能力。此外,我们提出了一种新的损失函数Focaler-WIoU,它优先考虑高质量的锚帧以提高检测器的性能。我们通过在三种复杂场景下的小目标检测数据集上的实验验证了我们模型的有效性。结果表明,所提出的MI-DETR模型参数比原模型减少40%,计算量减少5%。在这三个数据集上,MI-DETR的准确率分别为70.2%、34.5%和34.1%。小目标检测精度分别达到19.8%、11.5%和12.6%。此外,它的延迟分别减少了0.9 ms、1.0 ms和1.1 ms,优于其他类似大小的实时检测模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Displays
Displays 工程技术-工程:电子与电气
CiteScore
4.60
自引率
25.60%
发文量
138
审稿时长
92 days
期刊介绍: Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信