{"title":"Tumor Detection in Brain MRI using Residual Convolutional Neural Networks","authors":"Mohammad Reza Obeidavi, K. Maghooli","doi":"10.1109/MVIP53647.2022.9738767","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738767","url":null,"abstract":"Brain tumor is one of the complications that has a high mortality rate. Early detection of tumors can help treat this type of cancer. Among tumor detection methods, magnetic resonance imaging (MRI) is a common method. But there is always an attempt to detect the tumor automatically in medical images. Therefore, in this paper, a method for automatic detection of tumor in MRI images with the help of residual neural networks is introduced. By testing the proposed neural network on the BRATS data set, the results show well the efficiency of the proposed method.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133076492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Autoencoder Multi-Exposure HDR Imaging","authors":"A. Omrani, M. Soheili, M. Kelarestaghi","doi":"10.1109/MVIP53647.2022.9738552","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738552","url":null,"abstract":"Recently, in the era of photography, due to capturing images with limited dynamic range by cameras, High Dynamic Range (HDR) imaging has engrossed people’s attention because HDR pictures present more details and better luminance than images with Low Dynamic Range (LDR). Moreover, produced HDR images by a single LDR image cannot reconstruct details appropriately, and therefore, in this research, a deep learning method is proposed to generate an HDR picture by multiple LDR pictures with different exposures. The experiments and results illustrate that the proposed algorithm performs better than the other methods in quantitative and visual comparison.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130455599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saeed Khanehgir, Amir Mohammad Ghoreyshi, Alireza Akbari, R. Derakhshan, M. Sabokrou
{"title":"Light Face: A Light Face Detector for Edge Devices","authors":"Saeed Khanehgir, Amir Mohammad Ghoreyshi, Alireza Akbari, R. Derakhshan, M. Sabokrou","doi":"10.1109/MVIP53647.2022.9738740","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738740","url":null,"abstract":"Face detection is one of the most important and basic steps in the recognition and verification of human identity. Using models based on convolutional networks such as face detection models is very difficult and challenging due to a large number of parameters, computational complexity, and high power consumption in environments such as edge devices, mobiles with limited memory storage resources, and low computing power. In this paper, a light and fast face detection model is proposed to predict the face boxes with real-time speed and high accuracy. The proposed model is structured based on the YOLO algorithm and CSPDarknet53 tiny backbone. Some tricks such as calculating custom anchor boxes aimed to solve the detection problem of varying face scales and some optimization techniques such as pruning and quantization have also been used to optimize and reduce the number of parameters and improve the speed to make the final model strong and suitable for use in environments with low computational power. One of our best models with a MAP of 67.52% on the WIDER FACE dataset and a volume of 1.7 Mb and a speed of 1.43 FPS on a mobile phone with ordinary hardware has shown significant performance","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125695796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A face detection method via ensemble of four versions of YOLOs","authors":"Sanaz Khalili, A. Shakiba","doi":"10.1109/MVIP53647.2022.9738779","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738779","url":null,"abstract":"We implemented a real-time ensemble model for face detection by combining the results of YOLO v1 to v4. We used the WIDER FACE benchmark for training YOLOv1 to v4 in the Darknet framework. Then, we ensemble their results by two methods, namely, WBF (Weighted boxes fusion) and NMW (Non-maximum weighted). The experimental analysis showed that the mAP increases in the WBF ensemble of the models for all the easy, medium, and hard images in the datasets by 7.81%, 22.91%, and 12.96%, respectively. These numbers are 6.25%, 20.83%, and 11.11% for the NMW ensemble.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126536499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Haghpanah, Ehsan Saeedizade, M. T. Masouleh, A. Kalhor
{"title":"Real-Time Facial Expression Recognition using Facial Landmarks and Neural Networks","authors":"M. Haghpanah, Ehsan Saeedizade, M. T. Masouleh, A. Kalhor","doi":"10.1109/MVIP53647.2022.9738754","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738754","url":null,"abstract":"This paper presents a lightweight algorithm for feature extraction, classification of seven different emotions, and facial expression recognition in a real-time manner based on static images of the human face. In this regard, a Multi-Layer Perceptron (MLP) neural network is trained based on the foregoing algorithm. In order to classify human faces, first, some pre-processing is applied to the input image, which can localize and cut out faces from it. In the next step, a facial landmark detection library is used, which can detect the landmarks of each face. Then, the human face is split into upper and lower faces, which enables the extraction of the desired features from each part. In the proposed model, both geometric and texture-based feature types are taken into account. After the feature extraction phase, a normalized vector of features is created. A 3-layer MLP is trained using these feature vectors, leading to 96% accuracy on the test set.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128564129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Curriculum Learning for PolSAR Image Classification","authors":"Hamid Mousavi, M. Imani, H. Ghassemian","doi":"10.1109/MVIP53647.2022.9738781","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738781","url":null,"abstract":"Following the great success of curriculum learning in the area of machine learning, a novel deep curriculum learning method proposed in this paper, entitled DCL, particularly for the classification of fully polarimetric synthetic aperture radar (PolSAR) data. This method utilizes the entropy-alpha target decomposition method to estimate the degree of complexity of each PolSAR image patch before applying it to the convolutional neural network (CNN). Also, an accumulative mini-batch pacing function is used to introduce more difficult patches to CNN. Experiments on the widely used data set of AIRSAR Flevoland reveal that the proposed curriculum learning method can not only increase classification accuracy but also lead to faster training convergence.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129533035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Fine-grained Image Classification with Generative Adversarial Networks and Facial Landmark Detection","authors":"Mahdieh Darvish, Mahsa Pouramini, H. Bahador","doi":"10.1109/MVIP53647.2022.9738759","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738759","url":null,"abstract":"Fine-grained classification remains a challenging task because distinguishing categories needs learning complex and local differences. Diversity in the pose, scale, and position of objects in an image makes the problem even more difficult. Although the recent Vision Transformer models achieve high performance, they need an extensive volume of input data. To encounter this problem, we made the best use of GAN-based data augmentation to generate extra dataset instances. Oxford-IIIT Pets was our dataset of choice for this experiment. It consists of 37 breeds of cats and dogs with variations in scale, poses, and lighting, which intensifies the difficulty of the classification task. Furthermore, we enhanced the performance of the recent Generative Adversarial Network (GAN), StyleGAN2-ADA model to generate more realistic images while preventing overfitting to the training set. We did this by training a customized version of MobileNetV2 to predict animal facial landmarks; then, we cropped images accordingly. Lastly, we combined the synthetic images with the original dataset and compared our proposed method with standard GANs augmentation and no augmentation with different subsets of training data. We validated our work by evaluating the accuracy of fine-grained image classification on the recent Vision Transformer (ViT) Model. Code is available at: https://github.com/mahdi-darvish/GAN-augmented-pet-classifler","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132779953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, U. Krumnack
{"title":"Exploring the Properties and Evolution of Neural Network Eigenspaces during Training","authors":"Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, U. Krumnack","doi":"10.1109/MVIP53647.2022.9738741","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738741","url":null,"abstract":"We investigate properties and the evolution of the emergent inference process inside neural networks using layer saturation [1] and logistic regression probes [2]. We demonstrate that the difficulty of a problem, defined by the number of classes and complexity of the visual domain, as well as the number of parameters in neural network layers affect the predictive performance in an antagonistic manner. We further show that this relationship can be measured using saturation. This opens the possibility of detecting over- and under-parameterization of neural networks. We further show that the observed effects are independent of previously reported pathological patterns like the \"tail pattern\" described in [1]. Finally, we study the emergence of saturation patterns during training, showing that saturation patterns emerge early during training. This allows for early analysis and potentially increased cycle-time during experiments.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121892395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Peymanfard, M. R. Mohammadi, Hossein Zeinali, N. Mozayani
{"title":"Lip reading using external viseme decoding","authors":"J. Peymanfard, M. R. Mohammadi, Hossein Zeinali, N. Mozayani","doi":"10.1109/MVIP53647.2022.9738749","DOIUrl":"https://doi.org/10.1109/MVIP53647.2022.9738749","url":null,"abstract":"Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip movements during a conversation. This paper aims to show how to use external text data (for viseme-to-character mapping) by dividing video-to-character into two stages, namely converting video to viseme and then converting viseme to character by using separate models. Our proposed method improves word error rate by an absolute rate of 4% compared to the typical sequence to sequence lipreading model on the BBC-Oxford Lip Reading dataset (LRS2).","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130740669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}