基于fosd的非零起始语音数据集识别研究

D. Tran, R. Ibrahim
{"title":"基于fosd的非零起始语音数据集识别研究","authors":"D. Tran, R. Ibrahim","doi":"10.1109/SCOReD50371.2020.9251018","DOIUrl":null,"url":null,"abstract":"Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.","PeriodicalId":142867,"journal":{"name":"2020 IEEE Student Conference on Research and Development (SCOReD)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Identification of FOSD-based Non-zero Onset Speech Dataset\",\"authors\":\"D. Tran, R. Ibrahim\",\"doi\":\"10.1109/SCOReD50371.2020.9251018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.\",\"PeriodicalId\":142867,\"journal\":{\"name\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCOReD50371.2020.9251018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Student Conference on Research and Development (SCOReD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCOReD50371.2020.9251018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

语音机器人和聊天机器人应用程序开发的最新趋势使语音到文本(STT)和文本到语音(TTS)生成技术得以利用。为了开发这样的TTS或STT引擎,用于训练、验证和测试的音频文件中的文本和相应的录制语音必须保持一致。这是为了确保开发的发动机达到预期的转换质量。为了对齐语音和文本,应该使用音频对齐工具。在这些工具中,通常使用起始检测算法来标记音频文件的语音开始和结束时间。然后将此信息与文件的副本一起存储。在这项工作中,提供了一个开放的非零起始越南语语音数据集。该数据集包含348个音频文件,这些音频文件是从越南FPT公司2018年公开发布的2.5万多条(约30小时)越南语语音记录中筛选出来的。对于一种典型的发病检测算法的研究来说,这样的标记数据量是绰绰有余的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the Identification of FOSD-based Non-zero Onset Speech Dataset
Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信