Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li
{"title":"Distantly Supervised Named Entity Recognition with Spy-PU Algorithm","authors":"Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li","doi":"10.1109/PRML52754.2021.9520707","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520707","url":null,"abstract":"Named entity recognition is the basis of natural language processing tasks. In the field of Chinese named entity recognition, tag data sparseness is the core reason that limits the performance of named entity recognition models. To solve the problem, we propose a general approach, which can improve the effect of Chinese named entity recognition with a little samples. A key feature of the proposed method is that it can automatically label the unlabeled text through distant supervision hypothesis and use the Spy-PU algorithm to reduce the negative impact of unlabeled entity problem. Experimental results show that the method has better performance on four types of public data sets: MSRA, OntoNotes4.0, Resume and Weibo, and can effectively alleviate the impact of label data sparseness.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115828802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Probability Preserving Discriminative Nonnegative Matrix Factorization","authors":"Liuyin Lin, Xin Shu, Jing Song, C. Yu","doi":"10.1109/PRML52754.2021.9520691","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520691","url":null,"abstract":"Non-negative matrix factorization (NMF) has received increasing attention since it is a practical decomposition approach in computer vision and pattern recognition. NMF allows only additive combinations which leads to parts-based representation. Further, NMF and its variants often ignore the underlying local structure information. In this paper, we propose a novel objective which provides enough probabilistic semantics of intrinsic local topology via the probability preserving regularizer, together with the joint multiplicative update routine. Additionally, through the class indictor matrix coupled with the loss function, the generative and discriminative components with the property of local probability preservation can be simultaneously acquired which is rather optimal for the classification. The experimental results of both clustering and classification tasks demonstrate that performance of the proposed approach is clearly competitive with several other state-of-the-art algorithms.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Error Correction Based on Transformer LM in Uyghur Speech Recognition","authors":"Yan Zhang, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520740","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520740","url":null,"abstract":"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"45 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130558254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Moroccan Dialect “Darija” Automatic Speech Recognition: A Survey","authors":"Maria Labied, A. Belangour","doi":"10.1109/PRML52754.2021.9520690","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520690","url":null,"abstract":"Nowadays, human-machine interaction is growing swiftly, and Automatic Speech Recognition is gaining immense interest to make the daily routines much easier. This could be illustrated by the various applications of Speech Recognition in our daily lives, such as voice dictation, interactive voice response systems, device control, telephone applications, and others. Besides Automatic Speech Recognition, Natural language processing has gained significant improvements in terms of technologies and used approaches. Till today great results have been achieved in those Fields, especially for international languages such as English, Spanish, French, and Arabic. Whereas few results have been reached for dialects of languages such as the case of Moroccan dialect “Darija”. The growing use of Moroccan Darija on social media, videos, chatting and others, opens new research directions for Moroccan Darija speech recognition. The leading goal of this paper is to give a literature review on Moroccan Darija Automatic Speech Recognition. Through presenting the dialect specific constraints, the different works conducted in the field of Moroccan Darija speech recognition, and the progress made in recent years.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127482433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kazuyuki Matsumoto, Mopuaa Ryu, Minoru Yoshida, K. Kita
{"title":"Lifestyle Analysis via a Corpus of Disease-Fighting Weblogs","authors":"Kazuyuki Matsumoto, Mopuaa Ryu, Minoru Yoshida, K. Kita","doi":"10.1109/PRML52754.2021.9520697","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520697","url":null,"abstract":"In recent years, the population of pre-diabetics in Japan has been increasing year by year. Type 2 diabetes is a type of lifestyle-related disease that can be prevented to a certain extent by correcting lifestyle habits. However, by the time we realize that there is a problem with our lifestyle, it may be too late. Therefore, early detection of risk factors for lifestyle-related diseases is important. In this study, we collected the blogs of lifestyle-related disease fighters, set up multiple keyword categories that are considered to be related to risk factors, and constructed a corpus of disease fighting blogs with labels for each category. The results of the evaluation experiments show that our proposed method can be applied to a wide range of topics. As a result of evaluation experiments, our proposed method achieves categorization of keywords and sentences with higher accuracy than the simple method.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132202031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Turdi Tohti, Le Chang, A. Hamdulla, Hankiz Yilahun
{"title":"Concept Word Extraction for Bilingual Ontology Construction in Unstructured Text Environment*","authors":"Turdi Tohti, Le Chang, A. Hamdulla, Hankiz Yilahun","doi":"10.1109/PRML52754.2021.9520708","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520708","url":null,"abstract":"Aiming at the unsatisfactory efficiency of concept word extraction from unstructured text for domain ontology construction, this work first uses a combined statistic to judge the correctness of the concept word boundary determined by the word segmentation, and corrects the wrong segmentation position, thereby strengthening the structural integrity of the segmented candidate concept words. On this basis, the improved methods and various resource libraries are used to adjust the weight of concept words, and the main purpose is to strengthen the correlation between the weight and its domain attributes of concept words. We conducted experiments and comparisons on English-Chinese bilingual corpus, and found that the method of strengthening the structural integrity of concept words and the method of dynamically adjusting the weight of concept words proposed in this paper both brought a certain improvement in the efficiency of concept word extraction.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134545568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on Tibetan-Chinese Machine Translation Based on Multi-Strategy Processing","authors":"Saihu Liu, Jie Zhu, Zhensong Li, Zhixiang Luo","doi":"10.1109/PRML52754.2021.9520733","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520733","url":null,"abstract":"This article takes the low-resource nature of Tibetan-Chinese machine translation as the research object, acquires training data through a variety of strategies, and explores the problem of domain adaptability in Tibetan-Chinese materials and the problem of multi-granularity segmentation. Researched the Tibetan-Chinese machine translation method based on Transformer attention mechanism, studied the Tibetan-Chinese machine translation method with different segmentation granularity applied to both ends of encoder-decoder, evaluated multiple granular segmentation, corpus fusion of different fields and different types. The effect of corpus fusion is the experimental result with the highest BLEU score of 44.9 points.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114412175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deeply Fine-Tune a Convolutional Neural Network in Remote Sensing Image Classification: Easter Africa Countries (EAC)","authors":"M. J. Bosco, Wang Guoyin","doi":"10.1109/PRML52754.2021.9520703","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520703","url":null,"abstract":"Remote sensing is resource data accessible and easy to get in different areas without time-consuming. The traditional image recognition task was unlimited to better classification. A convolutional neural network (CNN) was introduced to improve remote sensing image classification accuracy by eliminating the intra-class and class similarity. Training CNN from scratch requires a large annotated dataset that is occasional in the remote sensing area. Transfer learning of CNN weights from another large non-remote sensing dataset can occasionally help overcome typical RS image applications. Transfer learning consists of fine-tuning CNN layers to better the new dataset. In this paper, all of the experiments were done on nine categories for dataset collected in east Africa community countries (EAC) using three state-of-the-art architectures based on the effect of fine-tuning and pre-trained weights of CNN. Results indicate that fine-tuning the entire network is not always a significant way; we compared it with a process of using VGG16-DensNet pre-trained weights and RF as machine learning classified results can be improved up to 97.60. Alternatively, fine-tuning the top blocks can save computational power and produce a more robust classifier.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127867346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Beauty Study Based on 3D Geometric Features","authors":"Wenming Han, Fangmei Chen, Fuming Sun","doi":"10.1109/PRML52754.2021.9520726","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520726","url":null,"abstract":"Facial beauty is related to different kinds of features, such as geometry, texture and expression. Geometric features are the most investigated ones, because 1) they have clear and interpretable definitions; 2) they do not change with face make-up, illumination and resolution; and 3) they can be used to guide the aesthetic plastic surgeries. Due to the high cost of 3D scanning, most existing works focus on 2D geometric features extracted from frontal face images. However, the profile information is neglected, which also plays an important role in facial beauty judgment. In this paper, we reconstruct 3D faces from 2D images using recent monocular 3D face reconstruction method. Then 22 anatomical landmarks are defined on the 3D face, and based on which totally 51 geometric features are extracted. Finally, we design experiments to evaluate the effectiveness of these features. The results show that ratio features are the most influential ones, and lips also affect facial beauty. Comparison between Asian and Caucasian shows that there are significant differences between different ethnic groups. For Asian faces, an angle feature related to face width and nose height has the highest ranking. For the Caucasian groups, the top-ranked features are length and ratio features, and the lip region plays an important role.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128296253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li
{"title":"Rumor Detection Based on Improved Transformer","authors":"Honghao Zheng, Hongtao Yu, Yinuo Hao, Yiteng Wu, Shaomei Li","doi":"10.1109/PRML52754.2021.9520704","DOIUrl":"https://doi.org/10.1109/PRML52754.2021.9520704","url":null,"abstract":"In the field of rumor detection, the existing Transformer-based methods ignore the location information and fail to effectively use the potential information of the text. Therefore, we propose a social media rumor detection method based on improved Transformer that improves the standard Transformer through two novel techniques. First, learnable relative positional encoding is used to endow the Transformer with the ability of direction- and distance-awareness. Second, absolute positional encoding is used, through which each word with different absolute positions is mapped to its corresponding representation space. The experimental results show that, compared with the current best benchmark method, the accuracy of this method on the three data sets of Twitter15, Twitter16 and Weibo has increased by 0.9%, 0.6%, and 1.4%, respectively. The improved Transformer is effective and can significantly improve the effect of social media rumor detection.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125184176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}