{"title":"Predicting good features using a hybrid feature for visual geolocation system","authors":"Reem Aljuaidi, Micheal Manzke","doi":"10.1117/12.2645302","DOIUrl":"https://doi.org/10.1117/12.2645302","url":null,"abstract":"We address the problem of accurately geolocating an image on a large city scale. Image geolocation is the process of distinguishing a place in an image through geotagged reference images depicting the same place. This is a challenging task due to the appearance changes in large outdoor environments. In particular, the limitation on using large geotagged images effectively for training. To overcome this limitation, we propose to select and predict good hybrid features, and cast the prediction score as a classification task. To this end, we generate training features and learn the classifier offline. For the image representation phase, we propose a new method called hybrid feature to make image representation robust against geometric and photometric changes and have a high discriminative level as well. By doing this, we achieve competitive results compared with other baseline methods. Also, our results show a significant improvement while using hybrid features compared to using handcrafted models or deep learning methods individually.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122387855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PDNet: an advanced architecture for polyp image segmentation","authors":"Hanqing Liu, Zhipeng Zhao, Ruichun Tang, Peishun Liu, Yixin Chen, Jianjun Zhang, Jing Jia","doi":"10.1117/12.2643392","DOIUrl":"https://doi.org/10.1117/12.2643392","url":null,"abstract":"In order to improve the segmentation accuracy of polyp image segmentation under colonoscopy, we propose PVT Dual-Upsampling Net (PDNet). PDNet adopts the encoder network based on Transformer as the backbone network for downsampling, and designs a dual upsampling module based on cascaded fusion network and simple connection network to recover the loss of high-level image features caused by the downsampling process, and obtains a high-level semantic feature map with the same resolution as the input image. The multi-feature fusion module is used to aggregate the low-level feature map and high-level semantic feature map. We validate the model on three publicly available datasets, and our experimental evaluations show that the suggested architecture produces good segmentation results on datasets.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122713073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deformable voxel grids for shape comparisons","authors":"Raphaël Groscot, L. Cohen","doi":"10.1117/12.2645961","DOIUrl":"https://doi.org/10.1117/12.2645961","url":null,"abstract":"We present Deformable Voxel Grids (DVGs) for 3D shapes comparison and processing. It consists of a voxel grid which is deformed to approximate the silhouette of a shape, via energy-minimization. By interpreting the DVG as a local coordinates system, it provides a better embedding space than a regular voxel grid, since it is adapted to the geometry of the shape. It also allows to deform the shape by moving the control points of the DVG, in a similar manner to the Free Form Deformation, but with easier interpretability of the control points positions. After proposing a computation scheme of the energies compatible with meshes and pointclouds, we demonstrate the use of DVGs in a variety of applications: correspondences via cubification, style transfer, shape retrieval and PCA deformations. The first two require no learning and can be readily run on any shapes in a matter of minutes on modest hardware. As for the last two, they require to first optimize DVGs on a collection of shapes, which amounts to a pre-processing step. Then, determining PCA coordinates is straightforward and brings a few parameters to deform a shape.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"35 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114039352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harikrishnan V. S., Shivam Dixit, P. K, Jitesh Kamnani, R. S., R. Venkatesan
{"title":"Deep learning video analytics solutions for ocean surveillance systems","authors":"Harikrishnan V. S., Shivam Dixit, P. K, Jitesh Kamnani, R. S., R. Venkatesan","doi":"10.1117/12.2643108","DOIUrl":"https://doi.org/10.1117/12.2643108","url":null,"abstract":"Moored data buoys are floating platforms at sea. These buoys serve as in-situ Weather, Ocean and Tsunami observatories. These buoys transmit real-time data through 3G/GSM/GPRS and satellite telemetry. Damage to the buoy systems by humans, boats, ships etc., intentional or otherwise, causes loss of data, and inhibits early warning systems. It also has financial implications due to the loss of the instruments, repair & reinstallation charges, and the time a ship spends to fix the buoy. Challenges arise while analyzing the video footage as they are unstable and shaky, due to the continuous movements of floating ocean buoy platforms caused by the state of the sea. This paper explores object detection algorithms for detecting eight different objects commonly found in the camera video footage transmitted by the buoy platforms at sea. The object detection training implementation gave us a best accuracy of 0.867MAP@0.5IOU. The object detection will help in solutions like Object Search, detection of floating marine plastic debris, understanding the direction of motion of ships, boats etc. In a broader perspective, it can help in Surveillance, Market Survey and Fish Detection in underwater cameras for fish abundance study.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122045164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvement of attention modules for image captioning using pixel-wise semantic information","authors":"Zhihao Chen, Keisuke Doman, Y. Mekada","doi":"10.1117/12.2644743","DOIUrl":"https://doi.org/10.1117/12.2644743","url":null,"abstract":"Although an attention mechanism is reasonable for generating image captions, how to obtain ideal image regions within the mechanism is a problem in practice due to the difficulty of its calculation between image and text data. In order to improve the attention modules for image captioning, we propose an algorithm for handling a pixel-wise semantic information, which is obtained as the outputs of semantic segmentation. The proposed method puts the pixel-wise semantic information into the attention modules for image captioning together with input text data and image features. We conducted evaluation experiments and confirmed that our method could obtain more reasonable weighted image features and better image captions with a BLEU-4 score of 0.306 than its original attention model with a BLEU-4 score of 0.243.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128504402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mineralization information extraction of metal mining area based on Landsat images","authors":"Rui-Feng Wang, Yuxin Gao, Wenlong Yu, Rubing Huang","doi":"10.1117/12.2643522","DOIUrl":"https://doi.org/10.1117/12.2643522","url":null,"abstract":"Remote sensing technology can quickly identify hydro thermal alteration related to mineralization and provide help for prospecting through efficient and accurate analysis. In this study, the Zhaoping fault zone is taken as the research area, Landsat8 OLI remote sensing images with good imaging quality are used as the data source, and ENVI software is used to perform pruning, radiometric calibration, atmospheric correction, and preprocessing of removing interference information including water and vegetation from remote sensing images in 2015 and 2020 respectively. The mineralized alteration information of this area is extracted by CROSTA method, and the extracted thematic information is analyzed and verified. In addition, by comparing the extraction maps of mineralization information in 2015 and 2020, it is found that the situation of over exploitation of mineral resources still exists. The results are in good agreement with the known ore points in the study area, indicating that the extraction of metal mineralization information through remote sensing images is of great significance to the study and rational utilization of mineral resources.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130537931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Ying Ling, K. W. Choo, Tiehua Du, Waiming Kong, Jerry Delphi Chen Yongqiang, Eu Jin Tan
{"title":"Automated osteoporosis prediction system using artificial intelligence to calculate cortical thickness index from hip X-rays","authors":"Wei Ying Ling, K. W. Choo, Tiehua Du, Waiming Kong, Jerry Delphi Chen Yongqiang, Eu Jin Tan","doi":"10.1117/12.2644476","DOIUrl":"https://doi.org/10.1117/12.2644476","url":null,"abstract":"Early diagnosis and regular monitoring of osteoporosis is key to prevent further deterioration and fractures in osteoporosis patients. Dual-energy X-ray Absorptiometry (DXA), despite being a gold standard for diagnosing osteoporosis, is not routinely ordered due to limited availability of DXA machine, especially in developing countries. As a result, orthopedists often lack DXA results at the time of examination. This study aims to develop an automated AI system to predict osteoporosis based on a plain x-ray scan of patient’s femur and demographic data, such as age, height and weight. The system first performs instance segmentation on the X-ray scan to locate femur, followed by image processing techniques to measure the inner and outer diameter of the femur, and then compute cortical thickness index (CTI). The CTI value, together with patient’s demographic data, is incorporated into a classification model to predict if the patient is suffering from osteoporosis. We found that the CTI calculated by the AI system is comparable to the manually calculated CTI. The AI system can predict at an accuracy of 85.3% using CTI and patient data.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121291787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Zhang, Jinsong Tang, Heping Zhong, Mingqiang Ning, Yue Fan
{"title":"Rotated target recognition of sonar images via convolutional neural networks with rotated inputs","authors":"Peng Zhang, Jinsong Tang, Heping Zhong, Mingqiang Ning, Yue Fan","doi":"10.1117/12.2644531","DOIUrl":"https://doi.org/10.1117/12.2644531","url":null,"abstract":"Rotated target recognition is a challenge for Convolutional Neural Networks (CNN), and the current solution is to make CNN rotational invariant through data augmentation. However, data augmentation makes CNN easy to overfit small scale sonar image datasets, and increases its numbers of parameters and training time. This paper proposes to recognize rotated targets of sonar images using a novel CNN with Rotated Inputs (RICNN), which doesn’t need data augmentation. During training, RICNN was trained with sonar images of targets only at one orientation, which avoid it to learn multiple rotated versions of the same targets, and reduces both number of parameters and training time of CNN. During testing, RICNN calculated classification scores for each test image and its all-possible rotated versions. The max of these classification scores were used to simultaneously estimate the category and orientation of each target. Besides, to improve the generalization of RICNN on imbalanced sonar datasets, this paper also designs an imbalanced data sampler. Experiments on a self-made small, imbalanced sonar image rotated target recognition dataset show that the improved RICNN achieves 4.25% higher classification accuracy than data augmentation, and reduces the number of parameters and training time to 2.25% and 19.2% of that of data augmentation method. Moreover, RICNN achieves comparable orientation estimation accuracy with a CNN orientation regressor trained with data augmentation. Codes, dataset are publicly available.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116184688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Sentinel-2 imagery for detecting oil spills via spatial roughness of mixed normalized difference index","authors":"Yaowamal Raphiphan, Suphongsa Khetkeeree","doi":"10.1117/12.2644658","DOIUrl":"https://doi.org/10.1117/12.2644658","url":null,"abstract":"The free and open access optical sensor data from the Sentinel-2 constellation can be used for supporting the operation of oil spill monitoring. It has several spectral bands from visible to shortwave infrared with medium to high resolution, which is suitable for detecting the oil spills. However, the spectral signature of the oil spill often has similar to the surrounding environment. Moreover, it also depends on many parameters, such as sensing angle, sea depth, wave characteristics, etc. In this paper, we propose the method for detecting the oil spill by using the Sentinel-2 images. It is based on the Mixed Normalized Difference Index (MNDI) derived from the Normalized Difference Vegetation Index (NDVI) and the Reversed version of the Normalized Difference Index (NDI) applied for the forest fire monitoring. This index will give the high variation values in the oil spill area, which can estimate the oil spill area by observing its spatial roughness. Four study areas in Saudi Arabia, Greece, Azerbaijan, and Indonesia were used to evaluate the detectable performance compared with the current methods. The visualized results show that our algorithm gives noticeable results and high contrast, including low noise, except the oil spills in Greece.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"53 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126325073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of fundus diseases based on meta-data and EB-IRV2 network","authors":"Xiangyu Deng, Feifei Ding","doi":"10.1117/12.2644254","DOIUrl":"https://doi.org/10.1117/12.2644254","url":null,"abstract":"Aiming at the problem that there may be one or more diseases and unbalanced distribution of labels in fundus images, in this paper proposes a multi-label classification method for fundus diseases based on the fusion of meta-data and EB-IRV2 network. Firstly, Efficientnet-B2 and InceptionResNetV2 networks are used to extract feature information from the left and right fundus image data, and then fuse with the meta-data with patient information, finally send them to the classifier for multi-label classification of fundus diseases. Adding patient’s meta-information into the model helps to better capture the lesion information and the location of the lesion in the fundus image, thus improving the accuracy of recognition. The experimental results show that the model in this paper achieves good classification results on the ODIR fundus image database, the accuracy rate is 96.00%, the recall rate is 92.37% and the F1-score is 94.11%, indicating that the proposed model has good robustness in the classification of multi-labeled fundus images.","PeriodicalId":314555,"journal":{"name":"International Conference on Digital Image Processing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122278795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}