2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

筛选
英文 中文
Padding Investigations for CNNs in Scene Parsing Tasks cnn在场景解析任务中的填充调查
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216084
Yu-Hui Huang, M. Proesmans, L. Gool
{"title":"Padding Investigations for CNNs in Scene Parsing Tasks","authors":"Yu-Hui Huang, M. Proesmans, L. Gool","doi":"10.23919/MVA57639.2023.10216084","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216084","url":null,"abstract":"Zero padding is widely used in convolutional neural networks (CNNs) to prevent the size of feature maps diminishing too fast. However, it has been claimed to disturb the statistics at the border [9]. In this work, we compare various padding methods for the scene parsing task and propose an alternative padding method (CApadding) by extending the image to alleviate the border issue. Experiments on Cityspaces [2] and Deep-Globe [3] show that models with the proposed padding method achieves higher mean Intersection-Over-Union (IoU) than the zero padding based models.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Unconditional Diffusion Models in Level Generation for Super Mario Bros 在《超级马里奥兄弟》关卡生成中使用无条件扩散模型
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215856
Hyeon Joon Lee, E. Simo-Serra
{"title":"Using Unconditional Diffusion Models in Level Generation for Super Mario Bros","authors":"Hyeon Joon Lee, E. Simo-Serra","doi":"10.23919/MVA57639.2023.10215856","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215856","url":null,"abstract":"This study introduces a novel methodology for generating levels in the iconic video game Super Mario Bros. using a diffusion model based on a UNet architecture. The model is trained on existing levels, represented as a categorical distribution, to accurately capture the game’s fundamental mechanics and design principles. The proposed approach demonstrates notable success in producing high-quality and diverse levels, with a significant proportion being playable by an artificial agent. This research emphasizes the potential of diffusion models as an efficient tool for procedural content generation and highlights their potential impact on the development of new video games and the enhancement of existing games through generated content.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129835064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Retail Product Recognition: Fine-Grained Bottle Size Classification 提高零售产品的识别度:细粒度的瓶子尺寸分类
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215699
Katarina Tolja, M. Subašić, Z. Kalafatić, S. Lončarić
{"title":"Enhancing Retail Product Recognition: Fine-Grained Bottle Size Classification","authors":"Katarina Tolja, M. Subašić, Z. Kalafatić, S. Lončarić","doi":"10.23919/MVA57639.2023.10215699","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215699","url":null,"abstract":"In this paper, we propose two innovative approaches to tackle the key challenges in product size classification, with a specific focus on bottles. Our research is particularly interesting as we leverage the bottle cap as a reference object, which allows bottle size classification to overcome challenges in the distance between the capturing device and the retail shelf, viewing angle, and arrangement of bottles on the shelves. We showcase the usage of the reference object in explicit and implicit novel approaches and discuss the benefits and limitations of the proposed methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130027986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image Impression Estimation by Clustering People with Similar Tastes 基于相似品味聚类的图像印象估计
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216055
Banri Kojima, Takahiro Komamizu, Yasutomo Kawanishi, Keisuke Doman, I. Ide
{"title":"Image Impression Estimation by Clustering People with Similar Tastes","authors":"Banri Kojima, Takahiro Komamizu, Yasutomo Kawanishi, Keisuke Doman, I. Ide","doi":"10.23919/MVA57639.2023.10216055","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216055","url":null,"abstract":"This paper proposes a method for estimating impressions from images according to the personal attributes of users so that they can find the desired images based on their tastes. Our previous work, which considered gender and age as personal attributes, showed promising results, but it also showed that users sharing these attributes do not necessarily share similar tastes. Therefore, other attributes should be considered to capture the personal tastes of each user well. However, taking more attributes into account leads to a problem in which insufficient amounts of data are served to classifiers due to the explosion of the number of combinations of attributes. To tackle this problem, we propose an aggregation-based method to condense training data for impression estimation while considering personal attribute information. For evaluation, a dataset of 4,000 carpet images annotated with 24 impression words was prepared. Experimental results showed that the use of combinations of personal attributes improved the accuracy of impression estimation, which indicates the effectiveness of the proposed approach.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122015977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves 咖啡叶疾病严重程度估计的低层次特征聚合网络
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215626
Takuhiro Okada, Yuantian Huang, Guoqing Hao, S. Iizuka, K. Fukui
{"title":"Low-Level Feature Aggregation Networks for Disease Severity Estimation of Coffee Leaves","authors":"Takuhiro Okada, Yuantian Huang, Guoqing Hao, S. Iizuka, K. Fukui","doi":"10.23919/MVA57639.2023.10215626","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215626","url":null,"abstract":"This paper presents a deep learning-based approach for the severity classification of coffee leaf diseases. Coffee leaf diseases are one of the significant problems in the coffee industry, where estimating the health status of coffee leaves based on their appearance is crucial in the production process. However, there have been few studies on this task, and cases of misclassification have been reported due to the inability to detect slight color differences when classifying the disease severity. In this work, we propose a low-level feature aggregation technique for neural network-based classifiers to capture the discolored distribution of the entire coffee leaf, which effectively supports discrimination of the severity. This feature aggregation is achieved by incorporating attention mechanisms in the shallow layers of the network that extract low-level features such as color. The attention mechanism in the shallow layers provides the network with information on global dependencies of the color features of the leaves, allowing the network to more easily identify the disease severity. We use an efficient computational technique for the attention modules to reduce memory and computational cost, which enables us to introduce the attention mechanisms in large-sized feature maps in the shallow layers. We conduct in-depth validation experiments on the coffee leaf disease datasets and demonstrate the effectiveness of our proposed model compared to state-of-the-art image classification models in accurately classifying the severity of coffee leaf diseases.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126971515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small Object Detection for Birds with Swin Transformer 基于Swin变压器的鸟类小目标检测
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216093
Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide
{"title":"Small Object Detection for Birds with Swin Transformer","authors":"Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, I. Ide","doi":"10.23919/MVA57639.2023.10216093","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216093","url":null,"abstract":"Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by difficulties due to blur, occlusion, and so on. Current small object detection methods are tailored to small and dense situations, such as pedestrians in a crowd or far objects in remote sensing scenarios. However, when the target object is small and sparse, there is a lack of objects available for training, making it more difficult to learn effective features. In this paper, we propose a specialized method for detecting a specific category of small objects; birds. Particularly, we improve the features learned by the neck; the sub-network between the backbone and the prediction head, to learn more effective features with a hierarchical design. We employ Swin Transformer to upsample the image features. Moreover, we change the shifted window size for adapting to small objects. Experiments show that the proposed Swin Transformer-based neck combined with CenterNet can lead to good performance by changing the window sizes. We further find that smaller window sizes (default 2) benefit mAPs for small object detection.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114070609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grid Sample Based Temporal Iteration and Compactness-coefficient Distance for High Frame and Ultra-low Delay SLIC Segmentation System 基于网格样本的高帧超低延迟SLIC分割系统时间迭代与紧凑系数距离
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215797
Yuan Li, Tingting Hu, Ryuji Fuchikami, T. Ikenaga
{"title":"Grid Sample Based Temporal Iteration and Compactness-coefficient Distance for High Frame and Ultra-low Delay SLIC Segmentation System","authors":"Yuan Li, Tingting Hu, Ryuji Fuchikami, T. Ikenaga","doi":"10.23919/MVA57639.2023.10215797","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215797","url":null,"abstract":"High frame rate and ultra-low delay vision systems, which process 1000 FPS videos within 1 ms/frame delay, play an increasingly important role in fields such as robotics and factory automation. Among them, an image segmentation system is necessary as segmentation is a crucial pre-processing step for various applications. Recently many existing researches focus on superpixel segmentation, but few of them attempt to reach high processing speed. To achieve this target, this paper proposes: (A) Grid sample based temporal iteration, which leverages the high frame rate video property to distribute iterations into the temporal domain, ensuring the entire system is within one frame delay. Additionally, grid sample is proposed to add initialization information to temporal iteration for the stability of superpixels. (B) Compactness-coefficient distance is proposed to add information of the entire superpixel instead of only using the information of the center point. The evaluation results demonstrate that the proposed superpixel segmentation system achieves boundary recall and under-segmentation error comparable to the original SLIC superpixel segmentation system. For label consistency, the proposed system is more than 0.02 higher than the original system.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124104848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Fall Detection on Edge Devices 边缘设备的无监督跌倒检测
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215993
Takuya Nakabayashi, H. Saito
{"title":"Unsupervised Fall Detection on Edge Devices","authors":"Takuya Nakabayashi, H. Saito","doi":"10.23919/MVA57639.2023.10215993","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215993","url":null,"abstract":"Automatic fall detection is a crucial task in healthcare as falls pose a significant risk to the health of elderly individuals. This paper presents a lightweight acceleration-based fall detection method that can be implemented on edge devices. The proposed method uses Autoencoders, a type of unsupervised learning, within the framework of anomaly detection, allowing for network training without requiring extensive labeled fall data. One of the challenges in fall detection is the difficulty in collecting fall data. However, our proposed method can overcome this limitation by training the neural network without fall data, using the anomaly detection framework of Autoencoders. Additionally, this method employs an extremely lightweight Autoencoder that can run independently on an edge device, eliminating the need to transmit data to a server and minimizing privacy concerns. We conducted experiments comparing the performance of our proposed method with that of a baseline method using a unique fall detection dataset. Our results confirm that our method outperforms the baseline method in detecting falls with higher accuracy.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121579551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer 基于跨模态变换的弱监督深度图像哈希
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216160
Ching-Ching Yang, W. Chu, S. Dubey
{"title":"Weakly-Supervised Deep Image Hashing based on Cross-Modal Transformer","authors":"Ching-Ching Yang, W. Chu, S. Dubey","doi":"10.23919/MVA57639.2023.10216160","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216160","url":null,"abstract":"Weakly-supervised image hashing emerges recently because web images associated with contextual text or tags are abundant. Text information weakly-related to images can be utilized to guide the learning of a deep hashing network. In this paper, we propose Weakly-supervised deep Hashing based on Cross-Modal Transformer (WHCMT). First, cross-scale attention between image patches is discovered to form more effective visual representations. A baseline transformer is also adopted to find self-attention of tags and form tag representations. Second, the cross-modal attention between images and tags is discovered by the proposed cross-modal transformer. Effective hash codes are then generated by embedding layers. WHCMT is tested on semantic image retrieval, and we show new state-of-the-art results can be obtained for the MIRFLICKR-25K dataset and NUS-WIDE dataset.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129882408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ViTVO: Vision Transformer based Visual Odometry with Attention Supervision 基于视觉变压器的视觉里程测量与注意监督
2023 18th International Conference on Machine Vision and Applications (MVA) Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215538
Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee
{"title":"ViTVO: Vision Transformer based Visual Odometry with Attention Supervision","authors":"Chu-Chi Chiu, Hsuan-Kung Yang, Hao-Wei Chen, Yu-Wen Chen, Chun-Yi Lee","doi":"10.23919/MVA57639.2023.10215538","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215538","url":null,"abstract":"In this paper, we develop a Vision Transformer based visual odometry (VO), called ViTVO. ViTVO introduces an attention mechanism to perform visual odometry. Due to the nature of VO, Transformer based VO models tend to overconcentrate on few points, which may result in a degradation of accuracy. In addition, noises from dynamic objects usually cause difficulties in performing VO tasks. To overcome these issues, we propose an attention loss during training, which utilizes ground truth masks or self supervision to guide the attention maps to focus more on static regions of an image. In our experiments, we demonstrate the superior performance of ViTVO on the Sintel validation set, and validate the effectiveness of our attention supervision mechanism in performing VO tasks.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127923758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信