MI-DETR: A small object detection model for mixed scenes

IF 3.7 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-04-15 DOI:10.1016/j.displa.2025.103052

Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong

{"title":"MI-DETR: A small object detection model for mixed scenes","authors":"Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong","doi":"10.1016/j.displa.2025.103052","DOIUrl":null,"url":null,"abstract":"<div><div>Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103052"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000897","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.

查看原文本刊更多论文

MI-DETR：用于混合场景的小型目标检测模型

目标检测是计算机视觉中的一项基本任务，有着广泛的应用。然而，现实世界场景的复杂性和物体的小尺寸给在混合环境中实现快速准确的检测带来了重大挑战。为了解决这些问题，我们提出了一种新的面向场景的混合小目标检测模型（MI-DETR），旨在提高检测精度和效率。MI-DETR骨干网结合了快速傅里叶变换、信道Shuffle和正交注意机制，在显著降低计算成本的同时提高了特征提取。此外，我们引入了专门的小目标特征层和多尺度特征融合（MSFF）模块来增强模型的特征融合能力。此外，我们提出了一种新的损失函数Focaler-WIoU，它优先考虑高质量的锚帧以提高检测器的性能。我们通过在三种复杂场景下的小目标检测数据集上的实验验证了我们模型的有效性。结果表明，所提出的MI-DETR模型参数比原模型减少40%，计算量减少5%。在这三个数据集上，MI-DETR的准确率分别为70.2%、34.5%和34.1%。小目标检测精度分别达到19.8%、11.5%和12.6%。此外，它的延迟分别减少了0.9 ms、1.0 ms和1.1 ms，优于其他类似大小的实时检测模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.