{"title":"A Hybrid Wheat Head Detection model with Incorporated CNN and Transformer","authors":"Shou Harada, Xian-Hua Han","doi":"10.23919/MVA57639.2023.10216087","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216087","url":null,"abstract":"Wheat head detection is an important research topic for production estimation and growth management. Motivated by the great advantages of the deep convolution neural networks (DCNNs) in many vision tasks, the deep-learning based methods have dominated the wheat head detection field, and manifest remarkable performance improvement compared with the traditional image processing methods. The existing methods usually divert the proposed detection models for the generic object detection to wheat head detection, and are insufficient in taking account of the specific characteristics of the wheat head images such as large variations due to different growth stages, high density and overlaps. This work exploits a novel hybrid wheat detection model by incorporating the CNN and transformer for modeling long-range dependence. Specifically, we firstly employ a backbone ResNet to extract multi-scale features, and leverage an inter-scale feature fusion module to aggregate coarse-to-fine features together for capturing sufficient spatial detail to localize small-size wheat head. Moreover, we propose a novel and efficient transformer block by incorporating the self-attention module in channel direction and the feature feed-forward subnet to explore the interaction among the aggregated multi-scale features. Finally a prediction head produces the centerness and size of wheat heads to obtain a simple anchor-free detection model. Extensive experiments on the Global Wheat Head Detection (GWHD) dataset have demonstrated the superiority of our proposed model over the existing state-of-the-art methods as well as the baseline model.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126063810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hansen Hendra, Yubin Liu, Ryoichi Ishikawa, Takeshi Oishi, Yoshihiro Sato
{"title":"Quadruped Robot Platform for Selective Pesticide Spraying","authors":"Hansen Hendra, Yubin Liu, Ryoichi Ishikawa, Takeshi Oishi, Yoshihiro Sato","doi":"10.23919/MVA57639.2023.10215812","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215812","url":null,"abstract":"Effective control of disease and pest infection is vital for maximizing crop yields, and pesticide spraying is a commonly used method for achieving this goal. This study proposes a novel approach to selective pesticide spraying using a quadruped robot platform, which we tested in a broccoli field. We developed an algorithm to detect and track worms based on our proposed Histogram of Oriented Gradients and Support Vector Machine (HOG-SVM) techniques, integrated with the recent object detection and tracking methods. Our platform was tested by traversing the furrows between the broccoli crop lines and continuously scanning to detect cabbage worms. Our experiments demonstrate that the proposed HOG-SVM algorithm successfully reduced the false positive rate of real-time worm detection by reducing around 90% for the imitation environments and around 60% for the actual field.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126736203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Transfer for Domain Adaptation in Crowd Counting","authors":"Shekhor Chanda, Yang Wang","doi":"10.23919/MVA57639.2023.10216197","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216197","url":null,"abstract":"We consider the problem of domain adaptation in crowd counting. Given a pre-trained model learned from a source domain, our goal is to adapt this model to a target domain using unlabeled data. The solution to this problem has a lot of potential applications in computer vision research that require a neural network model adapted to a target dataset. In this paper, we illustrate a dynamic domain adaptation technique. Specifically, we apply dynamic transfer for solving domain adaptation problems in crowd counting. The key insight is that adapting the model for the target domain is achieved by adapting the model across the data samples. The experimental results on several benchmark datasets demonstrate the effectiveness of our approaches.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124641108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Pose Prediction by Progressive Generation in Multi-scale Frequency Domain","authors":"Tomohiro Fujita, Yasutomo Kawanishi","doi":"10.23919/MVA57639.2023.10215966","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215966","url":null,"abstract":"We address a problem of 3D human pose prediction from a sequence of human body skeletons. To model the spatio-temporal dynamics, the discrete cosine transform (DCT) and the graph convolutional networks (GCN) are often applied to signals on a human skeleton graph. By DCT, temporal information of a human skeleton sequence can be embedded into the frequency domain. However, in previous studies, the prediction models using DCT implicitly learned each frequency coefficient by gradients calculated from a loss of the predictions and the ground truths of human body skeletons. In this paper, we propose a progressive human pose prediction model in frequency domain so that explicitly predict high-, medium-, and low-frequency motion of a target person. We confirmed that the proposed method improves prediction accuracy through experiments using public datasets on Human3.6M and CMU Mocap datasets.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129680157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Achieving Lightweight Deep Neural Network for Precision Agriculture with Maize Disease Detection","authors":"C. Padeiro, Takahiro Komamizu, I. Ide","doi":"10.23919/MVA57639.2023.10215815","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215815","url":null,"abstract":"Agriculture is the pillar industry of human survival. However, various crop diseases reduce the human food supply and lead to starvation and death in the worst cases. Experts perform visual symptoms observation for crop disease diagnosis. Which process is time-consuming and expensive. Also, the process has significant risk of human error due to subjective perception. Convolutional Neural Networks (CNN) use image processing techniques to show great potential in plant disease detection. However, it requires thousands of channels to learn rich features, resulting in large models requiring powerful computing, power supply, and high bandwidth, making it more expensive and difficult for farmers to acquire. Therefore, deploying these solutions on resource-constrained devices is desirable to make them more accessible. Thus, we propose a lightweight object detection CNN that can run on resource-constrained devices to detect crop diseases. Channel pruning is applied to optimize resource use by removing unimportant channels and filter weights to reduce network parameters, inference time, and the number of FLOPS. Experimental results with object detector, Faster R-CNN with two backbones, ResNet-50, and EfficientNet-B7, show significant improvement in model efficiency, keeping high accuracy.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114542446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MVA 2023 Cover Page","authors":"","doi":"10.23919/mva57639.2023.10216272","DOIUrl":"https://doi.org/10.23919/mva57639.2023.10216272","url":null,"abstract":"","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"264 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116220295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Agustin Castillo-Munguia, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi
{"title":"Diabetic Retinopathy Grading based on a Sparse Network Fusion of Heterogeneous ConvNeXt Models with Category Attention","authors":"Agustin Castillo-Munguia, Gibran Benitez-Garcia, J. Olivares-Mercado, Hiroki Takahashi","doi":"10.23919/MVA57639.2023.10216129","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216129","url":null,"abstract":"Diabetic retinopathy (DR) is an eye disease caused by high blood sugar levels that may damage vessels in the retina, leading to partial or complete loss of vision in later stages. In recent years, convolutional neural networks (CNN) have been used to help diagnose the DR severity. However, due to the slight differences between each class and the imbalanced nature of the datasets, standard CNNs often struggle to distinguish accurately between different grades of DR. To overcome these challenges, we propose combining a novel CNN model (ConvNeXt) with category-attention blocks incorporated at multiple levels of the architecture. This generates different models that can effectively extract fine-grained features and minimize the impact of dataset imbalance. Finally, we introduce a Sparse Network Fusion technique that learns to combine the outputs of all models to consolidate their individual decisions. Extensive experiments on the challenging DDR dataset show that our proposal achieves a new state-of-the-art performance, improving by about 3% grading accuracy compared with existing methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114482554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugo Bulzomi, Amélie Gruel, Jean Martinet, Takeshi Fujita, Yuta Nakano, R. Bendahan
{"title":"Object Detection for Embedded Systems Using Tiny Spiking Neural Networks: Filtering Noise Through Visual Attention","authors":"Hugo Bulzomi, Amélie Gruel, Jean Martinet, Takeshi Fujita, Yuta Nakano, R. Bendahan","doi":"10.23919/MVA57639.2023.10215590","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215590","url":null,"abstract":"Object detection is an important task becoming increasingly common in numerous applications for embedded systems. The traditional state-of-the-art deep neural networks (DNNs) tend to be incompatible with the limitations of many of those systems: their large size and high computational cost make them hard to deploy on hardware with limited resources. Spiking Neural Networks (SNNs) have been attracting attention in recent years because of their potential as energy-efficient alternatives when implemented on specialized hardware, and their smooth integration with energy-efficient event cameras. In this paper, we present a lightweight SNN architecture for efficient object detection in embedded systems using event camera data. We show that by applying visual attention mechanisms, we can ignore most of the noise from the input and thus reduce the number of neurons and activations since additional noise-filtering layers are not needed. Our proposed SNN is 24 times smaller than a previous similar method for our input resolution and maintains similar overall detection performances, while being more robust to noise. We finally demonstrate the energy efficiency of our network during runtime with an implementation on SpiNNaker chip, showing the applicability of our approach.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129331898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Static Specular Flow and Highlight with Deep Features for Specular Surface Detection","authors":"Hirotaka Hachiya, Yuto Yoshimura","doi":"10.23919/mva57639.2023.10215694","DOIUrl":"https://doi.org/10.23919/mva57639.2023.10215694","url":null,"abstract":"To apply robot teaching to a factory with many mirror-polished parts, it is necessary to detect the mirror-like surface accurately. Deep models for mirror detection have been studied by designing mirror-specific features, e.g., contextual contrast and similarity. However, the mirror-polished parts, e.g., plastic molds, tend to have complex shapes and ambiguous boundaries, and thus existing mirror-specific deep features could not work well. To detect such complex mirror-like surfaces, we propose combining static specular flow and highlight, frequently appearing in specular surfaces, with deep model-based multi-level feature pyramids and adaptively integrating multiple feature maps, including mirror-specific ones. Through experiments with our original real-world plastic mold dataset, we show the effectiveness of the proposed method.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123790371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daiki Mushiake, Kentaro Otomo, Chihiro Nakatani, N. Ukita
{"title":"Shape Preservation in Image Style Transfer for Gaze Estimation","authors":"Daiki Mushiake, Kentaro Otomo, Chihiro Nakatani, N. Ukita","doi":"10.23919/MVA57639.2023.10216216","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216216","url":null,"abstract":"This paper proposes image style transfer with shape preservation for gaze estimation. While several shape preservation constraints are proposed, we present additional shape preservation constraints using (i) dense pixelwise correspondences between the original and its transferred images and (ii) task-driven learning using gaze estimation error for directly improving gaze direction estimation. A variety of experiments with other SOTA methods, publicly-available datasets, and ablation studies validate the effectiveness of our method.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129692382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}