中国图象图形学报最新文献

筛选
英文 中文
Residual Neural Networks for Human Action Recognition from RGB-D Videos 用于从 RGB-D 视频识别人体动作的残差神经网络
中国图象图形学报 Pub Date : 2023-12-01 DOI: 10.18178/joig.11.4.343-352
K. V. Subbareddy, B. P. Pavani, G. Sowmya, N. Ramadevi
{"title":"Residual Neural Networks for Human Action Recognition from RGB-D Videos","authors":"K. V. Subbareddy, B. P. Pavani, G. Sowmya, N. Ramadevi","doi":"10.18178/joig.11.4.343-352","DOIUrl":"https://doi.org/10.18178/joig.11.4.343-352","url":null,"abstract":"Recently, the RGB-D based Human Action Recognition (HAR) has gained significant research attention due to the provision of complimentary information by different data modalities. However, the current models have experienced still unsatisfactory results due to several problems including noises and view point variations between different actions. To sort out these problems, this paper proposes two new action descriptors namely Modified Depth Motion Map (MDMM) and Spherical Redundant Joint Descriptor (SRJD). MDMM eliminates the noises from depth maps and preserves only the action related information. Further SRJD ensures resilience against view point variations and reduces the misclassifications between different actions with similar view properties. Further, to maximize the recognition accuracy, standard deep learning algorithm called as Residual Neural Network (ResNet) is used to train the system through the features extracted from MDMM and SRJD. Simulation experiments prove that the multiple data modalities are better than single data modality. The proposed approach was tested on two public datasets namely NTURGB+D dataset and UTD-MHAD dataset. The testing results declare that the proposed approach is superior to the earlier HAR methods. On an average, the proposed system gained an accuracy of 90.0442% and 92.3850% at Cross-subject and Cross-view validations respectively.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138621059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval 用于地标检索的空间金字塔注意力增强视觉描述符
中国图象图形学报 Pub Date : 2023-12-01 DOI: 10.18178/joig.11.4.359-366
Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan
{"title":"Spatial Pyramid Attention Enhanced Visual Descriptors for Landmark Retrieval","authors":"Luepol Pipanmekaporn, Suwatchai Kamonsantiroj, Chiabwoot Ratanavilisagul, Sathit Prasomphan","doi":"10.18178/joig.11.4.359-366","DOIUrl":"https://doi.org/10.18178/joig.11.4.359-366","url":null,"abstract":"Landmark retrieval, which aims to search for landmark images similar to a query photo within a massive image database, has received considerable attention for many years. Despite this, finding landmarks quickly and accurately still presents some unique challenges. To tackle these challenges, we present a deep learning model, called the Spatial-Pyramid Attention network (SPA). This network is an end-to-end convolutional network, incorporating a spatial-pyramid attention layer that encodes the input image, leveraging the spatial pyramid structure to highlight regional features based on their relative spatial distinctiveness. An image descriptor is then generated by aggregating these regional features. According to our experiments on benchmark datasets including Oxford5k, Paris6k, and Landmark-100, our proposed model, SPA, achieves mean Average Precision (mAP) accuracy of 85.3% with the Oxford dataset, 89.6% with the Paris dataset, and 80.4% in the Landmark-100 dataset, outperforming existing state-of-theart deep image retrieval models.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138625359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages 基于注意力的合并架构模型在印度语言图像字幕中的性能评估
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.294-301
Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale
{"title":"Evaluating Performances of Attention-Based Merge Architecture Models for Image Captioning in Indian Languages","authors":"Rahul Tangsali, Swapnil Chhatre, Soham Naik, Pranav Bhagwat, Geetanjali Kale","doi":"10.18178/joig.11.3.294-301","DOIUrl":"https://doi.org/10.18178/joig.11.3.294-301","url":null,"abstract":"Image captioning is a growing topic of research in which numerous advancements have been made in the past few years. Deep learning methods have been used extensively for generating textual descriptions of image data. In addition, attention-based image captioning mechanisms have also been proposed, which give state-ofthe- art results in image captioning. However, many applications and analyses of these methodologies have not been made in the case of languages from the Indian subcontinent. This paper presents attention-based merge architecture models to achieve accurate captions of images in four Indian languages- Marathi, Kannada, Malayalam, and Tamil. The widely known Flickr8K dataset was used for this project. Pre-trained Convolutional Neural Network (CNN) models and language decoder attention models were implemented, which serve as the components of the mergearchitecture proposed here. Finally, the accuracy of the generated captions was compared against the gold captions using Bilingual Evaluation Understudy (BLEU) as an evaluation metric. It was observed that the merge architectures consisting of InceptionV3 give the best results for the languages we test on, the scores discussed in the paper. Highest BLEU-1 scores obtained for each language were: 0.4939 for Marathi, 0.4557 for Kannada, 0.5082 for Malayalam, and 0.5201 for Tamil. Our proposed architectures gave much higher scores than other architectures implemented for these languages.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79904335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepEar: A Deep Convolutional Network without Deformation for Ear Segmentation DeepEar:一种无变形的深度卷积耳分割网络
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.242-247
Yuhan Chen, Wende Ke, Qingfeng Li, Dongxin Lu, Yani Bai, Zhen Wang
{"title":"DeepEar: A Deep Convolutional Network without Deformation for Ear Segmentation","authors":"Yuhan Chen, Wende Ke, Qingfeng Li, Dongxin Lu, Yani Bai, Zhen Wang","doi":"10.18178/joig.11.3.242-247","DOIUrl":"https://doi.org/10.18178/joig.11.3.242-247","url":null,"abstract":"With the cross-application of robotics in various fields, machine vision has gradually received attention. As an important part in machine vision, image segmentation has been widely applied especially in biomedical image segmentation, and many algorithms in image segmentation have been proposed in recent years. Nowadays, traditional Chinese medicine gradually received attention and ear diagnosis plays an important role in traditional Chinese medicine, the demand for automation in ear diagnosis becomes gradually intense. This paper proposed a deep convolution network for ear segmentation (DeepEar), which combined spatial pyramid block and the encoder-decoder architecture, besides, atrous convolutional layers are applied throughout the network. Noteworthy, the output ear image from DeepEar has the same size as input images. Experiments shows that this paper proposed DeepEar has great capability in ear segmentation and obtained complete ear with less excess region. Segmentation results from the proposed network obtained Accuracy = 0.9915, Precision = 0.9762, Recal l= 9.9723, Harmonic measure = 0.9738 and Specificity = 0.9955, which performed much better than other Convolution Neural Network (CNN)- based methods in quantitative evaluation. Besides, this paper proposed network basically completed ear-armor segmentation, further validated the capability of the proposed network.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82732573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Solar Radiation and Weather Analysis of Meteorological Satellite Data by Tensor Decomposition 基于张量分解的气象卫星资料太阳辐射与天气分析
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.271-281
N. Watanabe, A. Ishida, J. Murakami, N. Yamamoto
{"title":"Solar Radiation and Weather Analysis of Meteorological Satellite Data by Tensor Decomposition","authors":"N. Watanabe, A. Ishida, J. Murakami, N. Yamamoto","doi":"10.18178/joig.11.3.271-281","DOIUrl":"https://doi.org/10.18178/joig.11.3.271-281","url":null,"abstract":"In this study, the data obtained from meteorological satellites were analyzed using tensor decomposition. The data used in this paper are meteorological image data observed by the Himawari-8 satellite and solar radiation data generated from Himawari Standard Data. First, we applied Higher-Order Singular Value Decomposition (HOSVD), a type of tensor decomposition, to the original image data and analyzed the features of the data, called the core tensor, obtained from the decomposition. As a result, it was found that the maximum value of the core tensor element is related to the cloud cover in the observed area. We then applied Multidimensional Principal Component Analysis (MPCA), an extension of principal component analysis computed using HOSVD, to the solar radiation data and analyzed the Principal Components (PC) obtained from MPCA. We also found that the PC with the highest contribution rate is related to the solar radiation in the entire observation area. The resulting PC score was compared to actual weather data. From the result, it was confirmed that the temporal transition of the amount of solar radiation in this area can be expressed almost correctly by using the PC score.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79483390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Web-Based Application for Malaria Parasite Detection Using Thin-Blood Smear Images 基于web的疟疾寄生虫检测应用薄血涂片图像
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.288-293
W. Swastika, B. J. Pradana, R. B. Widodo, Rehmadanta Sitepu, G. G. Putra
{"title":"Web-Based Application for Malaria Parasite Detection Using Thin-Blood Smear Images","authors":"W. Swastika, B. J. Pradana, R. B. Widodo, Rehmadanta Sitepu, G. G. Putra","doi":"10.18178/joig.11.3.288-293","DOIUrl":"https://doi.org/10.18178/joig.11.3.288-293","url":null,"abstract":"Malaria is an infectious disease caused by the Plasmodium parasite. In 2019, there were 229 million cases of malaria with a death toll of 400.900. Malaria cases increased in 2020 to 241 million people with the death toll reaching 627,000. Malaria diagnosis which is carried out by observing the patient’s blood sample requires experts and if it is not done correctly, misdiagnosis can occur. Deep Learning can be used to help diagnose Malaria by classifying thin blood smear images. In this study, transfer learning techniques were used on the Convolutional Neural Network to speed up the model training process and get high accuracy. The architecture used for Transfer Learning is EfficientNetB0. The training model is embedded in a pythonbased web application which is then deployed on the Google App Engine platform. This is done so that it can be used by experts to help diagnose. The training model has a training accuracy of 0.9664, a training loss of 0.0937, a validation accuracy of 0.9734, and a validation loss of 0.0816. Prediction results on test data have an accuracy of 96.8% and an F1- score value of 0.968.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85297579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generation of High-Resolution Facial Expression Images Using a Super-Resolution Technique and Self-Supervised Guidance 使用超分辨率技术和自监督引导生成高分辨率面部表情图像
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.302-308
Tatsuya Hanano
{"title":"Generation of High-Resolution Facial Expression Images Using a Super-Resolution Technique and Self-Supervised Guidance","authors":"Tatsuya Hanano","doi":"10.18178/joig.11.3.302-308","DOIUrl":"https://doi.org/10.18178/joig.11.3.302-308","url":null,"abstract":"The recent spread of smartphones and social networking services has increased the means of seeing images of human faces. Particularly, in the face image field, the generation of face images using facial expression transformation has already been realized using deep learning–based approaches. However, in the existing deep learning–based models, only low-resolution images can be generated due to limited computational resources. Consequently, the generated images are blurry or aliasing. To address this problem, we proposed a two-step method to enhance the resolution of the generated facial images by combining a super-resolution network following the generative model, which can be considered a serial model, in our previous work. We further proposed a parallel model that trains a generative adversarial network and a superresolution network through multitask learning. In this paper, we propose a new model that integrates self-supervised guidance encoders into the parallel model to further improve the accuracy of the generated results. Using the peak signalto- noise ratio as an evaluation index, image quality was improved by 0.25 dB for the male test data and 0.28 dB for the female test data compared with our previous multitaskbased parallel model.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82634333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improvement of Presence in Live Music Videos and Alleviation of Discomfort of Viewers by Zooming Operation 通过缩放操作改善现场音乐视频的存在感,减轻观众的不舒适感
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.233-241
Ai Oishi, Eiji Kamioka, Phan Xuan Tan, Manami Kanamaru
{"title":"Improvement of Presence in Live Music Videos and Alleviation of Discomfort of Viewers by Zooming Operation","authors":"Ai Oishi, Eiji Kamioka, Phan Xuan Tan, Manami Kanamaru","doi":"10.18178/joig.11.3.233-241","DOIUrl":"https://doi.org/10.18178/joig.11.3.233-241","url":null,"abstract":"People can enjoy watching live performances without visiting live venues thanks to the development of live music streaming services on the Internet. However, such live music videos, especially those recorded by amateur band members, lack a sense of presence. Therefore, in the previous study, the authors proposed a method to improve the sense of presence in live music videos by performing zooming on the video frames. It achieved enhancing the sense of presence. However, it also increased the discomfort of the viewers. This is because the zooming was performed not on a music performer but in the center of the screen, resulting in an unnatural experience for the viewer. Therefore, in this paper, a new zooming method, which effectively emphasizes the music performer with intense movement, is proposed, introducing the concept of the “Main Spot”. The evaluation results through an experiment verified that the proposed method improved the sense of presence in live music videos and alleviated the discomfort of the viewers.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89375650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Method for Enhancing PET Scan Images Using Nonlocal Mean Filter 一种利用非局部均值滤波增强PET扫描图像的方法
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.282-287
Raghad Hazim Hamid, Nagham Saeed, H. M. Ahmed
{"title":"A Method for Enhancing PET Scan Images Using Nonlocal Mean Filter","authors":"Raghad Hazim Hamid, Nagham Saeed, H. M. Ahmed","doi":"10.18178/joig.11.3.282-287","DOIUrl":"https://doi.org/10.18178/joig.11.3.282-287","url":null,"abstract":"Medical images are an important source of information for both diagnosing and treating diseases. In many cases, the images produced by a Positron Emission Tomography (PET) scan are used to assess the effectiveness of a particular treatment. This paper presents a method for whole-body PET image denoising using a spatially-guided non-local means filter. The proposed method starts with clustering the images into regions. To estimate the noise, a Bayesian with automatic settings of the parameters was used. Then, only patches that belong to regions were collected and processed. The performance was compared to two methods; Gaussian and conventional Non-Local Means (NLM). The Jaszczak phantom and PET/ Computed Tomography (CT) for whole-body were involved in the benchmarking. The obtained results showed that in the Jaszczak phantom, the Signal-to-Noise Ratio (SNR) was significantly improved. Additionally, the proposed method improved the contrast and SNR compared to conventional NLM and Gaussian. Finally, the proposed method, in clinical whole-body PET, can be considered as another way of the post-reconstruction filter.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84005345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of Vehicle Detection and Counting Systems with UAV Cameras: Deep Learning and Darknet Algorithms 无人机摄像机车辆检测与计数系统的发展:深度学习和暗网算法
中国图象图形学报 Pub Date : 2023-09-01 DOI: 10.18178/joig.11.3.248-262
A. H. Rangkuti, Varyl Hasbi Athala, Farrel Haridhi Indallah
{"title":"Development of Vehicle Detection and Counting Systems with UAV Cameras: Deep Learning and Darknet Algorithms","authors":"A. H. Rangkuti, Varyl Hasbi Athala, Farrel Haridhi Indallah","doi":"10.18178/joig.11.3.248-262","DOIUrl":"https://doi.org/10.18178/joig.11.3.248-262","url":null,"abstract":"This study focuses on identifying and detecting several types of vehicles, with each vehicle’s position depicted by drone technology or an Unmanned Aerial Vehicle (UAV) camera. The vehicle’s position is captured from a height of 350 to 400 meters above the ground. This study aims to identify the class of vehicles that travel on the highway. The experiment employs several convolutional neural network models, including YOLOv4, YOLOv3, YOLOv7, DenseNet201-YOLOv3, and CSResNext50-Panet-SPP, to identify this type of vehicle. Meanwhile, the Darknet algorithm aids the training process by making it easier to identify the type of vehicle depicted in MP4 movies. Several other Convolution Neural Network (CNN) model experiments were conducted in this study, but due to hardware limitations, only these 5 CNN models could produce an optimal accuracy of up to 70%. Following several experiments, the CSResNext50-Panet-SPP model produced the highest accuracy while detecting 100% of video data using UAV technology, including the volume of vehicles detected while crossing the road. Other CNN models produced high accuracy values, such as DenseNet201- YOLOv3 and YOLOv4 models, which can detect up to 98% to 99% of the time. This research can improve its capabilities by detecting other classes that are affordable by UAV technology but require hardware and peripheral technology to support the training process.","PeriodicalId":36336,"journal":{"name":"中国图象图形学报","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89058367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信