{"title":"Digital Holography vs. Display Holography - What are their differences and what do they have in common?","authors":"S. Reichelt, G. Pedrini","doi":"10.1145/3589572.3589583","DOIUrl":"https://doi.org/10.1145/3589572.3589583","url":null,"abstract":"Holography is a two-stage imaging process in which the wave field of the object is recorded in a first step so that it can be reconstructed in a second step. It involves the physics of diffraction and interference to record and reconstruct optical wavefields or 3D objects. The connecting element between the recording and reconstruction stages is the hologram itself in which the holographic code is stored. While in the early days holography was a purely experimental and analog technique, the building blocks of holography were later digitalized step by step. Holograms were first simulated and later reconstructed by computer, the hologram storage medium became discretized optical elements, pixelated sensors, and light modulators. Due to different approaches and use cases, the language of holography has evolved in diverse and sometimes confusing ways. In this paper, we address the differences and similarities between digital holography and display holography. Both techniques are digital, but their meanings in the community are sometimes different. In general and common understanding, the term digital holography (DH) refers to a digital hologram recording of a wave field emanating from a 3D object, followed by a numerical reconstruction of that object. On the contrary, the term computer-generated display holography (CGDH) describes the numerical calculation of the hologram and its physical representation, followed by an experimental reconstruction of the 3D object by optical means. Thus, it is the purpose that distinguishes the two techniques: digital holograms are used to numerically reconstruct and measure previously captured (unknown) objects or object changes, whereas computer-generated display holograms are utilized to visualize (known) 3D objects or scenes in a way that best mimics natural vision. The purpose of this paper is to clarify the terminology of holography, contrasting digital holography and computer-generated display holography. In particular, we will explain how each method works, emphasize their specific characteristics and mention how they are used in different applications. We will also provide some examples of how the two technologies are used.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"105 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131893604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploiting Self-Imposed Constraints on RGB and LiDAR for Unsupervised Training","authors":"Andreas Hubert, Janis Jung, Konrad Doll","doi":"10.1145/3589572.3589575","DOIUrl":"https://doi.org/10.1145/3589572.3589575","url":null,"abstract":"Hand detection on single images is an intensively researched area, and reasonable solutions are already available today. However, fine-tuning detectors within a specific domain remains a tedious task. Unsupervised training procedures can reduce the effort required to create domain-specific datasets and models. In addition, different modalities of the same physical space, here color and depth data, represent objects differently and thus allow for exploitation. We introduce and evaluate a training pipeline to exploit the modalities in an unsupervised manner. The supervision is omitted by choosing suitable self-imposed constraints for the data source. We compare our training results with ground truth training results and show that with these modalities, the domain can be extended without a single annotation, e.g., for detecting colored gloves.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133304415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tasnim Ahmed, Ahnaf Munir, Sabbir Ahmed, Md. Bakhtiar Hasan, Md. Taslim Reza, M. H. Kabir
{"title":"Structure-Enhanced Translation from PET to CT Modality with Paired GANs","authors":"Tasnim Ahmed, Ahnaf Munir, Sabbir Ahmed, Md. Bakhtiar Hasan, Md. Taslim Reza, M. H. Kabir","doi":"10.1145/3589572.3589593","DOIUrl":"https://doi.org/10.1145/3589572.3589593","url":null,"abstract":"Computed Tomography (CT) images play a crucial role in medical diagnosis and treatment planning. However, acquiring CT images can be difficult in certain scenarios, such as patients inability to undergo radiation exposure or unavailability of CT scanner. An alternative solution can be generating CT images from other imaging modalities. In this work, we propose a medical image translation pipeline for generating high-quality CT images from Positron Emission Tomography (PET) images using a Pix2Pix Generative Adversarial Network (GAN), which are effective in image translation tasks. However, traditional GAN loss functions often fail to capture the structural similarity between generated and target image. To alleviate this issue, we introduce a Multi-Scale Structural Similarity Index Measure (MS-SSIM) loss in addition to the GAN loss to ensure that the generated images preserve the anatomical structures and patterns present in the real CT images. Experiments on the ‘QIN-Breast’ dataset demonstrate that our proposed architecture achieves a Peak Signal-to-Noise Ratio (PSNR) of 17.70 dB and a Structural Similarity Index Measure (SSIM) of 42.51% in the region of interest.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116082420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Face Recognition on RGB and Grayscale Color with Deep Learning in Forensic Science","authors":"Phornvipa Werukanjana, Prush Sa-nga-ngam, Norapattra Permpool","doi":"10.1145/3589572.3589586","DOIUrl":"https://doi.org/10.1145/3589572.3589586","url":null,"abstract":"In forensic science face recognition, we cannot request high-quality face images from sources, but we have face images from CCTV grayscale on the crime scene at night, face images in RGB mode from Web Cameras, etc. This research needs to find a satisfying method of face recognition in forensic science to identify the “Who's face?” at the request of a police investigator. The experiment uses Siamese neural network face recognition of both RGB and GRAY color modes to compare and show the performance of both color modes. The evaluation shows a confusion matrix, F1-score ROC/AUC, and a strong recommend with Likelihood ratio (LR) that supports court in evidence identification recommended by NIST and ENFSI.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"128 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120927377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On-Demand Multiclass Imaging for Sample Scarcity in Industrial Environments","authors":"Joan Orti, F. Moreno-Noguer, V. Puig","doi":"10.1145/3589572.3589573","DOIUrl":"https://doi.org/10.1145/3589572.3589573","url":null,"abstract":"While technology pushes towards controlling more and more complex industrial processes, data related issues are still a non-trivial problem to address. In this sense, class imbalances and scarcity of data occupy a lot of time and resources when designing a solution. In the surface defect detection problem, due to the random nature of the process, both situations are very common as well as a general decompensation between the image size and the defect size. In this work, we address a segmentation and classification problem with very few available images from every class, proposing a two-step process. First, by generating fake images using the guided-crop image augmentation method, we train for every single class a Pix2pix model in order to perform a mask-to-image translation. Once the model is trained, we also designed a automatic mask generator, to mimic the shapes of the dataset and thus create real-like images for every class using the pretrained networks. Eventually, using a context aggregation network, we use these fake images as our training set, changing every certain epochs the amount of images of every class on-demand, depending on the evolution of the individual loss term of every class. As a result, we accomplished stable and robust segmentation and classification metrics, regardless of the amount of data available for training, using the NEU Micro surface defect database.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117045498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recovering Image Information from Speckle Noise by Image Processing","authors":"Jianlin Nie, S. Hanson, M. Takeda, Wen Wang","doi":"10.1145/3589572.3589581","DOIUrl":"https://doi.org/10.1145/3589572.3589581","url":null,"abstract":"As a kind of noise, speckle seriously affects the imaging quality of optical imaging system. However, the speckle image carries a large amount of information related to the physical characteristics of the object surface, which can be used as the basis to identify and judge hidden objects. In this paper, speckle noise removal in optical imaging is studied. The average is derived for the squared moduli of spectra of short-exposure speckle images to recover the amplitude information. At the same time, cross-spectrum function is used to recover the phase information. We use this method to process the images. Then, the simulation experiment analysis is carried out by varying two aspects: the stacking numbers and the different objects. The results show that this method can recover the feature information from the speckle image, thus verifying the feasibility of the method.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121776210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruijie Ren, Mohit Gurnani Rajesh, Jordi Sanchez-Riera, Fan Zhang, Yurun Tian, Antonio Agudo, Y. Demiris, K. Mikolajczyk, F. Moreno-Noguer
{"title":"Grasp-Oriented Fine-grained Cloth Segmentation without Real Supervision","authors":"Ruijie Ren, Mohit Gurnani Rajesh, Jordi Sanchez-Riera, Fan Zhang, Yurun Tian, Antonio Agudo, Y. Demiris, K. Mikolajczyk, F. Moreno-Noguer","doi":"10.1145/3589572.3589594","DOIUrl":"https://doi.org/10.1145/3589572.3589594","url":null,"abstract":"Automatically detecting graspable regions from a single depth image is a key ingredient in cloth manipulation. The large variability of cloth deformations has motivated most of the current approaches to focus on identifying specific grasping points rather than semantic parts, as the appearance and depth variations of local regions are smaller and easier to model than the larger ones. However, tasks like cloth folding or assisted dressing require recognizing larger segments, such as semantic edges that carry more information than points. We thus first tackle the problem of fine-grained region detection in deformed clothes using only a depth image. We implement an approach for T-shirts, and define up to 6 semantic regions of varying extent, including edges on the neckline, sleeve cuffs, and hem, plus top and bottom grasping points. We introduce a U-Net based network to segment and label these parts. Our second contribution is concerned with the level of supervision required to train the proposed network. While most approaches learn to detect grasping points by combining real and synthetic annotations, in this work we propose a multilayered Domain Adaptation strategy that does not use any real annotations. We thoroughly evaluate our approach on real depth images of a T-shirt annotated with fine-grained labels, and show that training our network only with synthetic labels and our proposed DA approach yields results competitive with real data supervision.","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125169882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","authors":"","doi":"10.1145/3589572","DOIUrl":"https://doi.org/10.1145/3589572","url":null,"abstract":"","PeriodicalId":296325,"journal":{"name":"Proceedings of the 2023 6th International Conference on Machine Vision and Applications","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132793730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}