2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)最新文献

Intent Classification on Myanmar Social Media Data in Telecommunication Domain Using Convolutional Neural Network and Word2Vec 基于卷积神经网络和Word2Vec的缅甸电信领域社交媒体数据意图分类

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295031

Thet Naing Tun, K. Soe

{"title":"Intent Classification on Myanmar Social Media Data in Telecommunication Domain Using Convolutional Neural Network and Word2Vec","authors":"Thet Naing Tun, K. Soe","doi":"10.1109/O-COCOSDA50338.2020.9295031","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295031","url":null,"abstract":"Nowadays, people widely use social media and spend more time on that. Intentions behind users' generated content can be ranged from social good to feedbacks about the service or product of a company. With the help of deep learning models, users' intentions can classify more accurately. This paper focuses on the intent classification of users' generated comments on social media posted in Myanmar text. In this paper, Word2Vec is used to convert words into vector representations, which will be input for the Convolutional Neural Networks (CNN) to classify the users' comments to one of the pre-defined classes. Continuous Bag of Words (CBOW) architecture is used to train Word2Vec model. The proposed model's comparative experiment was performed on the baseline Recurrent Neural Network (RNN) model with a single recurrent layer. Facebook is a target social medial platform. Content from social media are domain-independent and makes it difficult to classify. So, in the proposed model, telecommunication is the target social media domain. Users' comments from that domain are regarded as feedbacks and collected as training and testing data for the model. According to the experimental result, the proposed model outperforms the average F-Score value of 0.94 over RNN.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115333328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children VOIS:首个专为缅甸听障儿童设计的语言治疗应用

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295024

A. Thida, Nway Nway Han, Sheinn Thawtar Oo, Sheng Li, Chenchen Ding

{"title":"VOIS: The First Speech Therapy App Specifically Designed for Myanmar Hearing-Impaired Children","authors":"A. Thida, Nway Nway Han, Sheinn Thawtar Oo, Sheng Li, Chenchen Ding","doi":"10.1109/O-COCOSDA50338.2020.9295024","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295024","url":null,"abstract":"The hearing-impaired children's education is challenging because they are unlikely to develop normal speech and language ability. We propose a mobile application VOIS, which is the first speech therapy application for hearing-impaired children in Myanmar. This mobile application uses a Convolutional Neural Network (CNN) based offline Burmese speech recognition system. It can help hearing-impaired children to train with the language pre-requisites at their own pace. To effectively help the hearing-impaired children to understand the basics of the language, this system provides one-syllable and two-syllable structured Myanmar words collected in real-life educational and communication materials. The experimental result shows that the prediction rate of this system is nearly 60%. Experiments also show the hearing-impaired children can learn and operate the language freely through a simple practice using this application. The expectation is that this application can bring both opportunities and life-quality improvements for children with hearing loss in Myanmar.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122979468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Creation and Analysis of Emotional Speech Database for Multiple Emotions Recognition 面向多种情绪识别的情绪语音数据库的建立与分析

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295041

Ryota Sato, Ryohei Sasaki, Norisato Suga, T. Furukawa

{"title":"Creation and Analysis of Emotional Speech Database for Multiple Emotions Recognition","authors":"Ryota Sato, Ryohei Sasaki, Norisato Suga, T. Furukawa","doi":"10.1109/O-COCOSDA50338.2020.9295041","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295041","url":null,"abstract":"Speech emotion recognition (SER) is one of the latest challenge in human-computer interaction. In conventional SER classification methods, a single emotion label is outputted per one utterance as the estimation result. This is because conventional speech emotional databases which are used to train SER models have a single emotion label for one utterance. However, it is often the case that multiple emotions are expressed simultaneously with different intensities in human speech. In order to realize more natural SER than ever, existence of multiple emotions in one utterance should be taken into account. Therefore, we created an emotional speech database which contains multiple emotions and their intensities labels. The creation experiment was conducted by extracting speech utterance parts where emotions appear from existing video works. In addition, we evaluated the created database by performing statistical analysis on the database. As a result, 2,025 samples were obtained, of which 1,525 samples contained multiple emotions.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127933320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Region Report 2020 Hong Kong 2020年香港地区报告

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/o-cocosda50338.2020.9295034

Tan Lee

引用次数: 0

Myanmar News Headline Generation with Sequence-to-Sequence model 缅甸新闻标题生成序列到序列模型

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295017

Yamin Thu, Win Pa Pa

{"title":"Myanmar News Headline Generation with Sequence-to-Sequence model","authors":"Yamin Thu, Win Pa Pa","doi":"10.1109/O-COCOSDA50338.2020.9295017","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295017","url":null,"abstract":"News Headline generation is one of the most valuable research recently in NLP area. Generation of News headline means by learning to map articles to headlines using Sequence-to-Sequence model. Headline Generator that used an encoder and a decoder designed using Long Short-Term Memory (LSTM) was applied in this work. In this paper, an automatic headline generation for Myanmar News article using Seq2Seq model is implemented. There are various ways to generate a headline for news. In this paper, headline was generated using Seq2Seq with one-hot encoding and described about the comparative analysis results. While constructing the model, there are some challenges such as vocabulary counting and find out unknown terms in word embedding. In order to get more meaningful results, used the error analysis to typical neural headline generation system and evaluated based on machine generated headlines and actual headlines using ROUGE evaluation metric. The experiments have been conducted on Myanmar News dataset of 7000 pairs of news articles and their corresponding headlines. According to the evaluation, Seq2Seq with one-hot encoding outperforms than other Seq2Seq with word embedding (GloVe) and Recursive Recurrent Neural Network (Recursive RNN).","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130223270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition 韵律信息辅助的基于dnn的普通话自发语音识别

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295010

Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen

{"title":"Prosodic Information-Assisted DNN-based Mandarin Spontaneous-Speech Recognition","authors":"Yu-Chih Deng, Cheng-Hsin Lin, Y. Liao, Yih-Ru Wang, Sin-Horng Chen","doi":"10.1109/O-COCOSDA50338.2020.9295010","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295010","url":null,"abstract":"This paper continues the method proposed in [1] and updates its traditional HMM-based ASR to state-of-the-art DNN-based ASR. Use prosodic information to assist state-of-the-art DNN-based Mandarin spontaneous-speech recognition, especially to alleviate the serious interference of annoying disfluencies and paralinguistic phenomena during decoding. This approach adopts a sophisticated hierarchical prosodic model (HPM) made of several break-syntax, break-acoustic, syllable prosodic and prosodic state models to rescore and improve the TDNN-f+RNNLM-based 1st pass decoding output and generate, at the same time, the word, Part of Speech (POS), Punctuation Mark (PM), tone, break type, and prosodic state tags for further use. Experimental results showed the HPM-based system not only dramatically reduced the word error rate from previous best value: 41.8% [1] to 21.2%. It also detected well the underlying POS, PMs, and tones (10.9%, 12.6%, and 2.3% error rates were achieved, respectively). This confirms that the proposed method is very promising on tackling the task of Mandarin spontaneous-speech recognition.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115361840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Japanese Quotation Marker “tte” in Conversation using Everyday Conversation Corpus* 日常会话语料库中的日语引文标记“tte”*

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295029

Yasuyuki Usuda

{"title":"Japanese Quotation Marker “tte” in Conversation using Everyday Conversation Corpus*","authors":"Yasuyuki Usuda","doi":"10.1109/O-COCOSDA50338.2020.9295029","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295029","url":null,"abstract":"This study investigates what participants achieve by employing a Japanese quotation marker “tte” in everyday conversation. Specifically, the focus is what is done in the utterance when the marker is at the end of the utterance. In the cases in this study, the direct quotations with the end of “tte” and similar ones are employed in telling environment, in which one side of participants keeps the right to speak to tell something about their thought or experience, and the others just response of listening to it. It is found that the types of the quotation are employed in the middle of the telling and the contents of the quotation can be seen as a punchline though it is not. Thus, the teller have to deal with the possibility of misunderstanding that the quotation is the punchline of the story. Therefore, it can be said that quotation with “tte” enables receivers of telling to understand that the utterance is not the end of the telling. This contributes the co-construction of the story and mutual understanding of the status of the telling going on.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134389797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Vietnam Country Report 2020: Updated activities on resources development for Vietnamese Speech and NLP 2020年越南国家报告:越南语言和NLP资源开发的最新活动

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/o-cocosda50338.2020.9295028

引用次数: 0

A pilot study on the perception of Chinese and English prosodic focus by Chinese learners of English: the effect of foreign accent 中国英语学习者对英汉韵律焦点感知的初步研究:外国口音的影响

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/O-COCOSDA50338.2020.9295003

Danhong Shen, Ping Tang

{"title":"A pilot study on the perception of Chinese and English prosodic focus by Chinese learners of English: the effect of foreign accent","authors":"Danhong Shen, Ping Tang","doi":"10.1109/O-COCOSDA50338.2020.9295003","DOIUrl":"https://doi.org/10.1109/O-COCOSDA50338.2020.9295003","url":null,"abstract":"Prosodic focus plays an important role in daily speech communication, typically marking the key information in utterances, such as the new information. It was found that English speakers utilize focus to differentiate between new and old information in perception, while it was unclear if Chinese learners of English were able to do so when perceiving prosodic focus in Mandarin Chinese (L1) and English (L2). Moreover, earlier studies showed that, native speakers show adaptation to foreign-accented semantic or syntactic infelicity, while it was unclear whether they show similar adaption to infelicitous prosodic focus. Therefore, the current (pilot) study explored (1) whether Chinese L2 learners were able to utilize prosodic focus to differentiate between new and old information when perceiving Chinese and English utterances, and (2) whether they show adaptation to infelicitous prosodic focus when hearing foreign accent. Twelve English major students were recruited as participants. Audio materials included Chinese and English utterances with felicitous, neutral and infelicitous focus conditions, produced by native speakers (without foreign accents) and L2 learners (with foreign accent). Visual-world paradigm was adopted to record the reaction time and eye movement. The results showed that Chinese L2 learners were able to utilize prosodic focus to differentiate between new and old information when hearing both Chinese and English utterances, showing fast response in felicitous focus condition. However, when hearing foreign accent, they did not utilize prosodic focus to perceive new/old information, showing adaptation to infelicitous focus. These (preliminary) results indicate that L2 learners can accurately perceive prosodic focus in English. It also implies that there, when hearing utterance produced with foreign-accent, listeners show adaptation to not only semantic and syntactic infelicity, but also prosodic infelicity.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123628925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Part-Of-Speech Tagger in Malayalam Using Bi-directional LSTM 基于双向LSTM的马拉雅拉姆语词性标注器

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2020-11-05 DOI: 10.1109/o-cocosda50338.2020.9295018

R. Rajan, Anna J. Joseph, Elizabeth K. Robin, Nishma T. K. Fathima

{"title":"Part-Of-Speech Tagger in Malayalam Using Bi-directional LSTM","authors":"R. Rajan, Anna J. Joseph, Elizabeth K. Robin, Nishma T. K. Fathima","doi":"10.1109/o-cocosda50338.2020.9295018","DOIUrl":"https://doi.org/10.1109/o-cocosda50338.2020.9295018","url":null,"abstract":"The majority of activities performed by humans are done through language, whether communicated directly or reported using natural language. As technology is increasingly making the methods and platforms on which we communicate ever more accessible, there is a great need to understand the languages we use to communicate. By combining the power of artificial intelligence, computational linguistics and computer science, natural language processing (NLP) helps machines “read” text by simulating the human ability to understand language. Part-of-speech tagging (POS Tagging) is done as a pre-requisite to simplify a lot of different NLP applications like question answering, speech recognition, machine translation, and so on. Here, we attempt a comparison between part-of-speech taggers in Malayalam using decision tree algorithm and bi-directional long short term memory (BLSTM). The experiments presented in this paper use two corpora, one of 29076 sentences and the other of 500 sentences for performance evaluation. The experiments demonstrate the potential of architectural choice of BLSTM-based tagger over conventional decision tree-based tagging in Malayalam.","PeriodicalId":385266,"journal":{"name":"2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122185136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1