{"title":"Research on Image Liquid Level Measurement Technology Based on Hough Transform","authors":"Yanqing Fu, Yongqing Peng, P. Liu, Weikui Wang","doi":"10.1109/PRML52754.2021.9520745","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520745","url":null,"abstract":"Based on the camera imaging model and the storage tank measurement environment, the corresponding relationship between the real liquid level height in the image and the liquid level contour radius is deduced in this paper. For the collected images, grayscale and morphological processing are performed first, and the Sobel operator is used for edge recognition. According to the situation, the corrosion calculation is used to reduce the subsequent calculation, and then the circle is detected by the optimized Hough transform to obtain the radius of the circle. Finally, the liquid level information is obtained according to the measurement model. Experimental results show that the maximum absolute error of the system is 2.99mm, and the maximum reference error is 0.75%. The system has certain theoretical and practical significance.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"29 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123873304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Driver’s Illegal Driving Behavior Detection with SSD Approach","authors":"Tao Yang, Jin Yang, Jicheng Meng","doi":"10.1109/PRML52754.2021.9520735","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520735","url":null,"abstract":"In this paper, an advanced detection approach of illegal driving behavior is proposed using Single Shot MultiBox Detector (SSD) based on deep learning. The detection of driver’s illegal driving behavior includes cellphone usage, cigarette smoke and no fastening seat belt. Doing this can greatly reduce the occurrence of traffic accidents. In order to validate the detection effect using SSD on small target objects, such as cigarette in complex environment, we use not only three online databases, i.e. HMDB human motion database, WIDER FACE Database, Hollywood-2 Database, but also a real database collected by ourselves. The experimental results show that the SSD approach has a better performance than the Faster Regions with Convolutional Neural Network (Faster R-CNN) for detecting driver’s illegal driving behavior.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127566072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangxiang Zhi, Dingguo Gao, Qijun Zhao, Shuiwang Li, Ci Qu
{"title":"Text Detection in Tibetan Ancient Books: A Benchmark","authors":"Xiangxiang Zhi, Dingguo Gao, Qijun Zhao, Shuiwang Li, Ci Qu","doi":"10.1109/PRML52754.2021.9520727","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520727","url":null,"abstract":"The digitization of Tibetan ancient books is of great significance to the preservation of Tibetan culture. This problem involves two tasks: Tibetan text detection and Tibetan text recognition. The former is undoubtedly crucial to automatic Tibetan text recognition. However, there are few works on Tibetan text detection, and lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we introduce the TxTAB dataset for evaluating text detection methods in Tibetan ancient books. The dataset is established based upon 202 treasured handwritten ancient Tibetan text images and is densely annotated with a multi-point annotation method without limiting the number of points. This is a challenging dataset with good diversity. It contains blurred images, gray and color images, the text of different sizes, the text of different handwriting styles, etc. An extensive experimental evaluation of 3 state-of-the-art text detection algorithms on TxTAB is presented with detailed analysis, and the results demonstrate that there is still a big room for improvements particularly for detecting Tibetan text in images of low quality.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125741088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transformer Based Multimodal Speech Emotion Recognition with Improved Neural Networks","authors":"Rutherford Agbeshi Patamia, Wu Jin, Kingsley Nketia Acheampong, K. Sarpong, Edwin Kwadwo Tenagyei","doi":"10.1109/PRML52754.2021.9520692","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520692","url":null,"abstract":"With the procession of technology, the human-machine interaction research field is in growing need of robust automatic emotion recognition systems. Building machines that interact with humans by comprehending emotions paves the way for developing systems equipped with human-like intelligence. Previous architecture in this field often considers RNN models. However, these models are unable to learn in-depth contextual features intuitively. This paper proposes a transformer-based model that utilizes speech data instituted by previous works, alongside text and mocap data, to optimize our emotional recognition system’s performance. Our experimental result shows that the proposed model outperforms the previous state-of-the-art. The IEMOCAP dataset supported the entire experiment.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121616848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation and Transformation Invariant Learning for Tomato Disease Classification","authors":"Getinet Yilma, Kumie Gedamu, Maregu Assefa, Ariyo Oluwasanmi, Zhiguang Qin","doi":"10.1109/PRML52754.2021.9520693","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520693","url":null,"abstract":"Deep learning-based plant disease management became a cost-effective way to improved agro-productivity. Advanced train sample generation and augmentation methods enlarge train sample size and improve feature distribution but generation and augmentation introduced sample feature discrepancy due to the generation learning process and augmentation artificial bias. We proposed a generation and geometric transformation invariant feature learning method using Siamese networks with maximum mean discrepancy loss to minimize the feature distribution discrepancies coming from the generated and augmented samples. Through variational GAN and geometric transformation, we created four dataset settings to train the proposed approach. The abundant evaluation results on the PlantVillage tomato dataset demonstrated the effectiveness of the proposed approach for the ResNet50 Siamese networks in learning generation and transformation invariant features for plant disease classification.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"39 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ebenezer Nii Ayi Hammond, Shijie Zhou, Hongrong Cheng, Qihe Liu
{"title":"DeepComp: A Deep Comparator for Improving Facial Age-Group Estimation","authors":"Ebenezer Nii Ayi Hammond, Shijie Zhou, Hongrong Cheng, Qihe Liu","doi":"10.1109/PRML52754.2021.9520698","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520698","url":null,"abstract":"We introduce an age-group estimation scheme known as DeepComp. It is a combination of an Early Information-Sharing Feature Aggregation (EISFA) mechanism and a ternary classifier. The EISFA part is a feature extractor that applies a siamese layer to input images and an aggregation module that sums up all the images. The ternary process compares the image representations into three possible outcomes corresponding to younger, similar, or older. From the comparisons, we arrive at a score indicating the similarity between an input and reference images: the higher the score, the closer the similarity. Experimentation shows that our DeepComp scheme achieves an impressive 94.9% accuracy on the Adience benchmark dataset using a minimum number of reference images per age group. Moreover, we demonstrate the generality of our method on the MORPH II dataset, and the result is equally impressive. Altogether, we show that, among other schemes, our method exemplifies facial age-group estimation.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126794235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hadaate Ullah, Yuxiang Bu, T. Pan, M. Gao, Sajjatul Islam, Yuan Lin, Dakun Lai
{"title":"Cardiac Arrhythmia Recognition Using Transfer Learning with a Pre-trained DenseNet","authors":"Hadaate Ullah, Yuxiang Bu, T. Pan, M. Gao, Sajjatul Islam, Yuan Lin, Dakun Lai","doi":"10.1109/PRML52754.2021.9520710","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520710","url":null,"abstract":"Recent findings demonstrated that deep neural networks carry out features extraction itself to identify the electrocardiography (ECG) pattern or cardiac arrhythmias from the ECG signals directly and provided good results compared to cardiologists in some cases. But, to face the challenge of huge volume of data to train such networks, transfer learning is a prospective mechanism where network is trained on a large dataset and learned experiences are transferred to a small volume target dataset. Therefore, we firstly extracted 78,999 ECG beats from MIT-BIH arrhythmia dataset and transformed into 2D RGB images and used as the inputs of the DenseNet. The DenseNet is initialized with the trained weights on ImageNet and fine-tuned with the extracted beat images. Optimization of the pre-trained DenseNet is performed with the aids of on-the-fly augmentation, weighted random sampler, and Adam optimizer. The performance of the pre-trained model is assessed by hold-out evaluation and stratified 5-fold cross-validation techniques along with early stopping feature. The achieved accuracy of identifying normal and four arrhythmias are of 98.90% and 100% for the hold-out and stratified 5-fold respectively. The effectiveness of the pre-trained model with the stratified 5-fold by transfer learning approach is surpassed compared to the state-of-art-the approaches and models, and also explicit the maximum generalization of imbalanced classes.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on the Methods of Speech Synthesis Technology","authors":"Jinyao Hu, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520718","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520718","url":null,"abstract":"An important technology to realize human computer interaction is the technology of converting a given text into natural speech, that is speech synthesis. This paper succinctly expounds the development process of speech synthesis, analyzes the shortcomings of traditional speech synthesis technology, and highlights the advantages and disadvantages of various vocoders. Due to the overwhelming contribution of deep learning to the field of speech synthesis, this paper introduces several pioneering research results in the field of speech synthesis, expounds its main ideas, advantages and disadvantages, and inspires new ideas on this basis. Finally, it objectively discusses and analyzes the problems of speech synthesis technology and puts forward the direction that can be further studied.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114939724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mixing and Separation Method of Signals + Color Images Based on Two-Dimensional CCA","authors":"C. Kexin, Fan Liya, Yang Jing","doi":"10.1109/PRML52754.2021.9520716","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520716","url":null,"abstract":"Blind Source Separation (BSS) is a traditional and challenging problem in signal processing, in which the mixed signals can be separated according to the independence of source signals. The one-dimensional CCA-based signal and color image mixing and separation method needs to reshape the image into vector data, which destroys the spatial structure of the image and affects the recovery effect of the color image. To this end, a mixing and separation method of signals + color images based on two-dimensional CCA, in this paper, is proposed. This method utilizes the auto-correlation among original color images and signals to recover signals and images with high qualities. Comparative experiments with one-dimensional CCA on the COIL-100 data set show that the proposed method is effective and high-speed.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134405089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Assessment of Facial Paralysis Based on Facial Landmarks","authors":"Yuxi Liu, Zhimin Xu, L. Ding, Jie Jia, Xiaomei Wu","doi":"10.1109/PRML52754.2021.9520746","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520746","url":null,"abstract":"Unilateral peripheral facial paralysis is the most common case of facial paralysis. It affects only one side of the face, which will cause facial asymmetry. Clinically, unilateral peripheral facial paralysis is often classified by clinicians according to evaluation scales, based on patients’ condition of facial symmetry. A prevalent scale is House-Brackmann grading system (HBGS). However, assessment results from scales are often with great subjectivity, and will bring high interobserver and intraobserver variability. Therefore, this manuscript proposed an objective method to provide assessment results by using facial videos and applying machine learning models. This grading method is based on HBGS, but it is automatically implemented with high objectivity. Images with facial expressions will be extracted from the videos to be analyzed by a machine learning model. Facial landmarks will be acquired from the images by using a 68-points model provided by dlib. Then index and coordinate information of the landmarks will be used to calculate the values of features pre-designed to train the model and predict the result of new patients. Due to the difficulty of collecting facial paralysis samples, the data size is limited. Random Forest (RF) and support vector machine (SVM) were compared as the classifiers. This method was applied on a data set of 33 subjects. The highest overall accuracy rate reached 88.9%, confirming the effectiveness of this method.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122489309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}