2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)最新文献

筛选
英文 中文
Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision 基于机器视觉的甘蔗坯质量分类与分割
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034561
{"title":"Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision","authors":"","doi":"10.1109/DICTA56598.2022.10034561","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034561","url":null,"abstract":"Machine learning is widely used in agriculture to optimize practices such as planting, crop detection, and harvesting. The sugar industry is a major contributor to the global economy, valuable both as a food source and as a sustainable crop with useful byproducts. This paper presents three machine vision algorithms capable of performing quality classification and segmentation of raw sugarcane billets, developing a proof-of-concept for implementation at our industry partner's mill in NSW. Such a system has the potential to improve quality and reduce costs associated with an essential yet labor-intensive, inefficient, and unreliable process. Two recent iterations of the popular You Only Look Once (YOLO) algorithm, YOLOR and YOLOX, are trained for classification, with the state-of-the-art Mask R-CNN network used for segmentation. The best performing classification model, YOLOX, achieves a classification mAP50:95 of 90.1% across 7 classes in real time, with an average inference speed of 19.36 ms per image. Segmentation accuracy of AP50 of 70.8% and AR50-95 of 83.5% was achieved using the Mask CNN-R network.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120999309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic point cloud compression using slicing focusing on self-occluded points 动态点云压缩使用切片聚焦自遮挡点
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034563
{"title":"Dynamic point cloud compression using slicing focusing on self-occluded points","authors":"","doi":"10.1109/DICTA56598.2022.10034563","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034563","url":null,"abstract":"Realistic digital representations of 3D objects and surroundings have been recently made possible. This is due to recent advances in computer graphics allowing real-time and realistic physical world interactions with users [1], [2]. Emerging technologies enable real-world objects, persons, and scenes to move dynamically across users' views convincingly using a 3D point cloud [3]–[5]. A point cloud is a set of individual 3D points that are not organized and without any relationship in the 3D space [1], [6]. Each point has a 3D position but can also contain some other attributes (e.g., texture, reflectance, colour, and normal), creating a realistic visual representation model for static and dynamic 3D objects [3], [7]. This is desirable for many applications such as geographic information systems, cultural heritage, immersive telepresence, telehealth, disabled access, 3D telepresence, telecommunication, autonomous driving, gaming and robotics, virtual reality (VR), and augmented reality (AR) [2], [8]. Even the use of point cloud in Metaverse when creating an avatar or content in Metaverse and object-based interaction is required. The Metaverse is a virtual world that creates a network where anyone can interact through their avatars [9]. Therefore, it is critical to present the 3D virtual world as close to the real world as possible, with high-resolution and minimal noise and blur.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"227 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123255024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding ComicLib:一个新的用于草图理解的大规模漫画数据集
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034579
{"title":"ComicLib: A New Large-Scale Comic Dataset for Sketch Understanding","authors":"","doi":"10.1109/DICTA56598.2022.10034579","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034579","url":null,"abstract":"The sketch is essential in everyday communication and has received much attention in the computer vision community. In general, researchers use learning-based approaches to study sketch-based algorithms. These methods rely on large-scale data to train complex models to achieve satisfactory performance. Most existing datasets are drawn by unskilled users in a closed environment. These datasets are of low complexity, making deep learning models unable to extract more information. This paper proposes a new large-scale comic sketch dataset called ComicLib for sketch understanding. We scan 181,354 comic sketch images from the comic library and annotate them through a crowdsourcing annotation platform developed by ourselves. Finally, we obtain a dataset of millions of comic objects in 17 categories. We conduct comparative experiments on sketch recognition, retrieval, detection, generation and colorization using a number of deep learning algorithms. These experiments provide the benchmark performance of the ComicLib dataset. We hope that ComicLib can contribute to the field of sketch-based research.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126599200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Salient Face Prediction without Bells and Whistles 突出的面孔预测没有铃铛和口哨
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034571
{"title":"Salient Face Prediction without Bells and Whistles","authors":"","doi":"10.1109/DICTA56598.2022.10034571","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034571","url":null,"abstract":"Salient face prediction in multiple-face videos is a fundamental task in machine vision. It finds usage in various applications like video editing and human-machine interactions. The field has seen significant progress in recent years, backed by large datasets comprising specifically of multi-face videos. As the first contribution, we present promise in a visual-only baseline, achieving state-of-the-art results for salient face prediction. Our work motivates reconsideration towards sophisticated multimodal, multi-stream architectures. We further show that a simple upstream task like active speaker detection can give a reasonable baseline and match prior tailored models for detecting salient faces. Moreover, we bring to light the inconsistencies in evaluation strategies, highlighting a need for standardization. We propose using a ranking-based evaluation for the task. Overall, our work motivates a fundamental course correction before re-initiating the search for novel architectures and frameworks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116030595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FootSeg: Automatic Anatomical Segmentation of Foot Bones from Weight-Bearing Cone Beam CT Scans FootSeg:从负重锥束CT扫描中自动解剖分割足骨
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034620
{"title":"FootSeg: Automatic Anatomical Segmentation of Foot Bones from Weight-Bearing Cone Beam CT Scans","authors":"","doi":"10.1109/DICTA56598.2022.10034620","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034620","url":null,"abstract":"Weight-bearing cone beam CT (CBCT), which provides high-resolution scanning in the natural weight-bearing position, is an emerging technique in orthopedic research. The high quality scans from CBCT machines have greatly facilitated the treatment and diagnosis of human foot [1], such as foot align [2] and foot surgery [3] [4]. In these clinical practices, an essential step to analyze the CBCT foot scan is the anatomical segmentation of foot bones which provides an overall understanding of the patient's situation.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114200833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SimplestNet-Drone: An efficient and Accurate Object Detection Algorithm for Drone Aerial Image Analytics SimplestNet-Drone:一种高效、准确的无人机航拍图像目标检测算法
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034564
{"title":"SimplestNet-Drone: An efficient and Accurate Object Detection Algorithm for Drone Aerial Image Analytics","authors":"","doi":"10.1109/DICTA56598.2022.10034564","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034564","url":null,"abstract":"Images captured by drones are extremely difficult to detect due to varying camera angles, distances, sizes, and environmental conditions, making it challenging to accurately detect an object from a height. Nonetheless, object detection plays a crucial role in computer vision and has made significant improvements to images captured by drones. We apply the YOLOv5 framework with modified feature extraction and focus detection. The problem with aerial images is object size and viewing angle from a high altitude, so we proposed a single-stage object detection model called “SimplestNet-Drone”. We included a fourth prediction head to improve the object detection on the smallest objects and improve the detection speed. The algorithm's prediction accuracy is improved by adding an attention model mechanism, which detects attention regions in environments and suppresses unnecessary information. The model was trained and tested on the VisDorne dataset and compared with other object detection models. The model shows great improvement, with a mean average precision of 63.72%, and has improved the Yolo architecture. A real-time implementation of our model can be watched in the following YouTube video: https://youtu.be/De8t4tjtb6w","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Graph Convolution for Instance Segmentation 实例分割的协同图卷积
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034643
{"title":"Co-Graph Convolution for Instance Segmentation","authors":"","doi":"10.1109/DICTA56598.2022.10034643","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034643","url":null,"abstract":"Segmenting various instances in various contexts with a common model is a challenge for instance segmentation. In this paper, we address this problem by capturing rich relationship information and propose our Co-Graph Convolution Network (CGC-Net). Based on Mask R-CNN, we propose our co-graph convolution mask head. Specifically, we decouple the mask head into two mask heads. For each mask head, we append a graph convolution layer to capture the corresponding relationship information. One focuses on the relationship information between appearance features for each position of the instance itself, while the other pays more attention to the semantic relationship between each channel for the corresponding instance's features. In addition, we add a co-relationship module to each graph convolution layer to share similar relationships between instances with the same category in an image. We integrate the outputs of two mask heads by element-wise multiplication to improve feature representation for final instance segmentation prediction. Compared with other state-of-the-art instance segmentation methods, experiments on MS COCO and Cityscapes datasets demonstrate our method's competitiveness. Besides, in order to verify the generalization of our CGC-Net, we also add our CGC-Net to other instance segmentation networks, and the experiment results show our method still can obtain stable gains in performance.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130273688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prostate Cancer Diagnosis from Structured Clinical Biomarkers with Deep Learning: Anonymous Authors 从结构化临床生物标志物与深度学习诊断前列腺癌:匿名作者
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034567
{"title":"Prostate Cancer Diagnosis from Structured Clinical Biomarkers with Deep Learning: Anonymous Authors","authors":"","doi":"10.1109/DICTA56598.2022.10034567","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034567","url":null,"abstract":"Prostate cancer (PC) is one of the most aggressive cancers that exist. Early detection of PC is indispensable for treatment. Biopsies are often carried out to determine the Gleason score of PC which helps to predict the aggressiveness of PC. As biopsies have considerable associated risk, especially for old people, machine learning can be used to predict the PC Gleason grade from clinical biomarkers. These biomarkers are typically structured in a table. In this paper, we propose to use advanced tabular deep neural network architectures, like TabNet and TabTransformer, to grade PC. We also perform a comparative study of various machine learning approaches, including traditional methods, tree-based classifiers, and shallow neural networks, for this purpose. Our experimental results demonstrate the superior performance of the TabNet deep learning method.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128919707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Image-based Detection of Dyslexic Readers from 2-D Scan path using an Enhanced Deep Transfer Learning Paradigm 使用增强深度迁移学习范式的二维扫描路径中基于图像的失读症读者检测
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034577
{"title":"Image-based Detection of Dyslexic Readers from 2-D Scan path using an Enhanced Deep Transfer Learning Paradigm","authors":"","doi":"10.1109/DICTA56598.2022.10034577","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034577","url":null,"abstract":"Dyslexia is a learning syndrome commonly found in children that causes poor reading and comprehending skills even though they have normal intelligence. Dyslexia is more prevalent among school children. Dyslexia is caused by wide range of features and the exact cause is still unclear which makes it difficult for developing a generalized dyslexia detection model. Feature engineering to extract major features that contribute for generalized capability of the classifier is a significant challenge while developing a classification model for dyslexia. Conventional models for prediction of dyslexia based on psychological assessments, Imaging methods such as Magnetic Resonance Images, functional MRI images and Electroencephalogram (EEG) signals are not usually preferred for clinical disorders such as dyslexia especially on children due to adverse radioactive effects. To overcome these problems, this research work adapts an image-based technique for prediction of dyslexia based on eye gaze points while reading. Eye movement tracking methods are non-invasive and rich indices of brain study and cognitive processing. The eye gaze point while reading is tracked and represented as 2-D scan path images. The work also proposes an enhanced Dense Net deep transfer learning solution for feature engineering and classification of dyslexia. A new approach of enhanced Dense Net deep transfer learning is proposed where a deep learning model is built from 2d-scanpath images of dyslexia. This pre-trained model is used further to classify dyslexia using deep transfer learning. The proposed system uses the key characteristics of deep learning and transfer learning and has shown high performance when compared to existing state-of-the-art machine learning models with a high accuracy rate of 96.36 %. The results demonstrate that the enhanced deep transfer learning model performed well in identifying significant features and classification of dyslexia using 2-D scan path images.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130918195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Decoupled Training with Bag of Tricks for Long-Tailed Recognition 对长尾识别解耦训练的再思考
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA) Pub Date : 2022-11-30 DOI: 10.1109/DICTA56598.2022.10034607
{"title":"Rethinking Decoupled Training with Bag of Tricks for Long-Tailed Recognition","authors":"","doi":"10.1109/DICTA56598.2022.10034607","DOIUrl":"https://doi.org/10.1109/DICTA56598.2022.10034607","url":null,"abstract":"Learning from imbalanced datasets remains a significant challenge for real-world applications. The decoupled training approach seems to achieve better performance among existing approaches for long-tail recognition. Moreover, there are simple and effective tricks that can be used to further improve the performance of decoupled learning and help models trained on long-tailed datasets to be more robust to the class imbalance problem. However, if used inappropriately, these tricks can result in lower than expected recognition accuracy. Unfortunately, there is a lack of comprehensive empirical studies that provide guidelines on how to combine these tricks appropriately. In this paper, we explore existing long-tail visual recognition tricks and perform extensive experiments to provide a detailed analysis of the impact of each trick and come up with an effective combination of these tricks for decoupled training. Furthermore, we introduce a new loss function called hard mining loss (HML), which is more suitable to learn the model to better discriminate head and tail classes. In addition, unlike previous work, we introduce a new learning scheme for decoupled training following an end-to-end process. We conducted our evaluation experiments on the CIFAR10, CIFAR100 and iNaturalist 2018 datasets. The results11Code is available at the link will be made available. show that our method outperforms existing methods that address class imbalance issue for image classification tasks. We believe that our approach will serve as a solid foundation for improving class imbalance problems in many other computer vision tasks.","PeriodicalId":159377,"journal":{"name":"2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"855 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信