Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong
{"title":"MI-DETR:用于混合场景的小型目标检测模型","authors":"Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong","doi":"10.1016/j.displa.2025.103052","DOIUrl":null,"url":null,"abstract":"<div><div>Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"88 ","pages":"Article 103052"},"PeriodicalIF":3.7000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MI-DETR: A small object detection model for mixed scenes\",\"authors\":\"Bo Peng , Shidong Xiong , Yangjian Wang , Tingting Zhou , Jinlan Li , Hanguang Xiao , Rong Xiong\",\"doi\":\"10.1016/j.displa.2025.103052\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"88 \",\"pages\":\"Article 103052\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225000897\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225000897","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
MI-DETR: A small object detection model for mixed scenes
Object detection is a fundamental task in computer vision with many applications. However, the complexity of real-world scenarios and the small size of objects present significant challenges to achieving rapid and accurate detection in mixed environments. To address these challenges, we propose a novel mixed scene-oriented small object detection model (MI-DETR) designed to enhance detection accuracy and efficiency. The backbone network of MI-DETR incorporates the Fast Fourier Transform, Channel Shuffle, and Orthogonal Attention Mechanism to improve feature extraction while reducing computational costs significantly. Additionally, we introduce a specialized small object feature layer and a Multi-Scale Feature Fusion (MSFF) module to strengthen the model’s feature fusion capabilities. Furthermore, we propose a novel loss function, Focaler-WIoU, which prioritizes high-quality anchor frames to enhance the detector’s performance. We validate the effectiveness of our model through experiments on small object detection datasets across three complex scenarios. The results show that the proposed MI-DETR model has 40% fewer parameters and 5% less computational effort than the former. On these three datasets, MI-DETR achieves accuracies of 70.2%, 34.5%, and 34.1%, respectively. It also achieves small object detection accuracies of 19.8%, 11.5%, and 12.6%, respectively. Additionally, its latency decreases by 0.9 ms, 1.0 ms, and 1.1 ms, respectively, outperforming other real-time detection models of similar size.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.