2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)最新文献

筛选
英文 中文
Global F0 control parameter prediction based on impressions for communicative prosody generation 基于印象的交流韵律生成的全局F0控制参数预测
L. Shao, Y. Greenberg, Y. Sagisaka
{"title":"Global F0 control parameter prediction based on impressions for communicative prosody generation","authors":"L. Shao, Y. Greenberg, Y. Sagisaka","doi":"10.1109/ICSDA.2013.6709871","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709871","url":null,"abstract":"Aiming at communicative speech synthesis, prosody control using impressions has been proposed by applying the correlation between impressions of input lexicons and prosody. In this paper, as the first step to compute communicative prosody, we attempt to predict the F0 generation model parameters by estimating the impressions of input sentence from its constituent lexicons. To obtain an impression vector consisting of three dimensional factors (positive-negative, confident-doubtful and allowable-unacceptable) for a given input utterance, we proposed a computational scheme to calculate impression vectors using impression scores of constituent words. Using obtained sentence impression vectors, F0 control parameters are predicted by applying three-layered feed-forward neural networks. To evaluate the effectiveness of the proposed computational framework, we experimentally confirmed that F0 parameters of communicative speech could be generated from the impressions of input lexicons.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131637661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Blind source separation: A review and analysis 盲源分离:综述与分析
Madhab Pal, Rajib Roy, Joyanta Basu, M. S. Bepari
{"title":"Blind source separation: A review and analysis","authors":"Madhab Pal, Rajib Roy, Joyanta Basu, M. S. Bepari","doi":"10.1109/ICSDA.2013.6709849","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709849","url":null,"abstract":"Blind Source Separation (BSS) refers to a problem where both the sources and the mixing methodology are unknown, only mixture signals are available for further separation process. In several situations it is desirable to recover all individual sources from the mixed signal, or at least to segregate a particular source. In laboratory condition, most of the algorithms works very fine where input signals, no. of source present in the mixture, mixing methodology etc are well known to the separation process. But in real-life scenario the problem is much more complicated and it begins with the input signal, a mixture where most of the parameters are unknown. This paper will try to summarize those approaches taken previously to solve this problem and an experiment of source separation which will mix using Independent Component Analysis (ICA) and then de-mix those source signals using ICA as the basic/prime approach.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125364860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 45
An evaluation of Mongolian data-driven Text-to-Speech 蒙古语数据驱动文本转语音的评价
Altangerel Chagnaa, Purev Jaimai, Kerey Yesyenbyek, C. Hansakunbuntheung
{"title":"An evaluation of Mongolian data-driven Text-to-Speech","authors":"Altangerel Chagnaa, Purev Jaimai, Kerey Yesyenbyek, C. Hansakunbuntheung","doi":"10.1109/ICSDA.2013.6709881","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709881","url":null,"abstract":"This paper presents a first attempt to evaluate data-driven speech synthesis of Mongolian trained on 1500-sentence female speech corpus. The speech corpus contains nearly 6 hours of Mongolian female speech that is designed to cover all Mongolian phones. The evaluation is done on two levels. In overall quality evaluation, we generated 25 sentences and asked raters about their quality based on Mean Opinion Score (MOS). The second evaluation uses Phoneme confusion test, which contains all possible phoneme set in Mongolian.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122616284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improve Japanese C2L learners' capability to distinguish Chinese tone 2 and tone 3 through perceptual training 通过感知训练提高日语C2L学习者区分汉语二、三声调的能力
Jinsong Zhang, Xiaoyun Wang, Yue Sun, M. Nishida, T. Zou, Seiichi Yamamoto
{"title":"Improve Japanese C2L learners' capability to distinguish Chinese tone 2 and tone 3 through perceptual training","authors":"Jinsong Zhang, Xiaoyun Wang, Yue Sun, M. Nishida, T. Zou, Seiichi Yamamoto","doi":"10.1109/ICSDA.2013.6709850","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709850","url":null,"abstract":"In the process of Chinese learning, Tone 2 and Tone 3 are the most problematic pair for Japanese learners. We propose to develop a perceptual training paradigm to help them to gain efficiently the perceptual ability to distinguish the tones. A series of three studies were carried out: the first checked how difficult the Japanese learners produce the tones. The second investigated how differently Japanese and Chinese people perceive the two tones. The third tested a hybrid perceptual training paradigm lasting 6 days: a 2-days-long adaptive training followed by a 4-days-long high-variability training. Results of these studies not only improved our knowledge about tone production and perception patterns with respect to Japanese and Chinese speakers, but also showed the effectiveness of the proposed hybrid perceptual training paradigm which achieved a significant improvement of tone distinguishing ability (a relative error reduction of 77% in 6 days) by 6 Japanese participants.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122866946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A syllable-based framework for unit selection synthesis in 13 Indian languages 13种印度语言中基于音节的单位选择合成框架
H. Patil, T. Patel, Nirmesh J. Shah, Hardik B. Sailor, R. Krishnan, G. Kasthuri, T. Nagarajan, S. Christina, Naresh Kumar, Veera Raghavendra, K. Prahallad, S. Prasanna, Nagaraj Adiga, Sanasam Ranbir Singh, Anand Konjengbam, Pranaw Kumar, Bira Chandra Singh, S. Kumar, T. G. Bhadran, T. Sajini, Arup Saha, T. Basu, K. S. Rao, N. Narendra, A. Sao, Rakesh Kumar, P. Talukdar, P. Acharyaa, S. Chandra, Swaran Lata, H. Murthy
{"title":"A syllable-based framework for unit selection synthesis in 13 Indian languages","authors":"H. Patil, T. Patel, Nirmesh J. Shah, Hardik B. Sailor, R. Krishnan, G. Kasthuri, T. Nagarajan, S. Christina, Naresh Kumar, Veera Raghavendra, K. Prahallad, S. Prasanna, Nagaraj Adiga, Sanasam Ranbir Singh, Anand Konjengbam, Pranaw Kumar, Bira Chandra Singh, S. Kumar, T. G. Bhadran, T. Sajini, Arup Saha, T. Basu, K. S. Rao, N. Narendra, A. Sao, Rakesh Kumar, P. Talukdar, P. Acharyaa, S. Chandra, Swaran Lata, H. Murthy","doi":"10.1109/ICSDA.2013.6709851","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709851","url":null,"abstract":"In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers are built. Building TTS systems for low-resource languages requires that the data be carefully collected an annotated as the database has to be built from the scratch. Various criteria have to addressed while building the database, namely, speaker selection, pronunciation variation, optimal text selection, handling of out of vocabulary words and so on. The various characteristics of the voice that affect speech synthesis quality are first analysed. Next the design of the corpus of each of the Indian languages is tabulated. The collected data is labeled at the syllable level using a semiautomatic labeling tool. Text to speech synthesizers are built for all the 13 languages, namely, Hindi, Tamil, Marathi, Bengali, Malayalam, Telugu, Kannada, Gujarati, Rajasthani, Assamese, Manipuri, Odia and Bodo using the same common framework. The TTS systems are evaluated using degradation Mean Opinion Score (DMOS) and Word Error Rate (WER). An average DMOS score of ≈3.0 and an average WER of about 20 % is observed across all the languages.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129449707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Development of a standard text and speech corpus for the Punjabi language 开发旁遮普语的标准文本和语音语料库
S. Dhanjal, S. S. Bhatia
{"title":"Development of a standard text and speech corpus for the Punjabi language","authors":"S. Dhanjal, S. S. Bhatia","doi":"10.1109/ICSDA.2013.6709891","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709891","url":null,"abstract":"In this paper, a new text and speech corpus in the Punjabi language has been developed. The Punjabi language is a modern Indo-Aryan language. The Punjabi language has been ranked amongst the top spoken languages of the world. Over the years, this ranking has varied between 10 and 18. Any research work on the Punjabi language, therefore, assumes an international significance. The Punjabi language is the native language of the Punjab state in two countries: East Punjab in India, and West Punjab in Pakistan. There are many dialects of the Punjabi language and two different scripts in both countries. It will be an enormous task to design a new text or speech corpus that can completely describe all dialects in both scripts. This work, therefore, concentrates only on one dialect of the Punjabi language: the Malwai dialect. This paper describes at least 20 unique features of the newly designed corpus.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation and error recovery methods of an IVR based real time speech recognition application 基于IVR的实时语音识别应用的评估和错误恢复方法
Soma Khan, Joyanta Basu, M. S. Bepari, Rajib Roy
{"title":"Evaluation and error recovery methods of an IVR based real time speech recognition application","authors":"Soma Khan, Joyanta Basu, M. S. Bepari, Rajib Roy","doi":"10.1109/ICSDA.2013.6709847","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709847","url":null,"abstract":"Field trial and evaluation of any real world speech recognition application using Interactive Voice Response technology are likely to be a daunting task. It has to face challenges regarding spoken language conventions, pronunciation variations, recognition issues in noisy environment, limitations of human cognition, working memory and differences between users. Present study illustrates the entire evaluation process of such an agricultural information retrieval system mainly targeted towards semi-literate or illiterate farmers. A new set of evaluation metrics as per the designed evaluation strategies, details of field trial processes, feedback analysis and finally system performance results are presented in a well organized way. Additionally to meet users' expectations, distinctive error recovery methods like Signal Analysis and Decision, Confidence Measure and Polling, Complementary Information, Runtime model generation etc. are introduced and incorporated to confirm performance enhancement in final trial. Evaluation methods and metrics used here are domain independent and applicable to similar systems.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115836126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PL-ILT: A web tool for creation of pronunciation lexicon in Indian languages 一个用于创建印度语言发音词典的网络工具
Sankar Mukherjee, S. Mandal
{"title":"PL-ILT: A web tool for creation of pronunciation lexicon in Indian languages","authors":"Sankar Mukherjee, S. Mandal","doi":"10.1109/ICSDA.2013.6709858","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709858","url":null,"abstract":"This paper present the efforts involved in designing a mass development tool to create comprehensive machine readable pronunciation lexicon for Indian languages. The lexicon file contains the orthography, corresponding pronunciations, parts-of-speech, morphosyntactic description, idiolectic variation of word pronunciation and meaning of lexical entries in a format based on requirements defined by the W3C Voice Browser Activity Pronunciation Lexicon Specification (PLS) 1.0. The current version of the Pronunciation Lexicon for Indian Languages Toolkit (PL-ILT) PLS contains approximately 2 million lexical entries for Bengali. Although in this paper we only describe language specific issues related to Bengali, PL-ILT has the ability to adapt different languages.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133380858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of prosody in text-to-speech synthesis system of Bangla 孟加拉语文本-语音合成系统的韵律评价
T. Basu, Arup Saha
{"title":"Evaluation of prosody in text-to-speech synthesis system of Bangla","authors":"T. Basu, Arup Saha","doi":"10.1109/ICSDA.2013.6709868","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709868","url":null,"abstract":"In speech synthesis the role of prosody is very crucial. To make the synthesized speech more natural and soothing to the human ears various prosody and intonation model together with emotional model have been experimented over last few decades. Apart from the segmental quality and voice characteristics, it depends mostly on the quality of the prosody model which is responsible for the naturalness of any TTS system. But as it is very hard to evaluate prosody model in an objective way, a perceptual comparison method is adopted in this work to evaluate prosody model.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129657720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-speaker, narrowband, continuous Marathi speech database 多扬声器,窄带,连续马拉地语语音数据库
Tejas Godambe, N. Bondale, K. Samudravijaya, P. Rao
{"title":"Multi-speaker, narrowband, continuous Marathi speech database","authors":"Tejas Godambe, N. Bondale, K. Samudravijaya, P. Rao","doi":"10.1109/ICSDA.2013.6709844","DOIUrl":"https://doi.org/10.1109/ICSDA.2013.6709844","url":null,"abstract":"We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.","PeriodicalId":266295,"journal":{"name":"2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123419621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信