基于fosd的非零起始语音数据集识别研究

2020 IEEE Student Conference on Research and Development (SCOReD) Pub Date : 2020-09-27 DOI:10.1109/SCOReD50371.2020.9251018

D. Tran, R. Ibrahim

{"title":"基于fosd的非零起始语音数据集识别研究","authors":"D. Tran, R. Ibrahim","doi":"10.1109/SCOReD50371.2020.9251018","DOIUrl":null,"url":null,"abstract":"Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.","PeriodicalId":142867,"journal":{"name":"2020 IEEE Student Conference on Research and Development (SCOReD)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Identification of FOSD-based Non-zero Onset Speech Dataset\",\"authors\":\"D. Tran, R. Ibrahim\",\"doi\":\"10.1109/SCOReD50371.2020.9251018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.\",\"PeriodicalId\":142867,\"journal\":{\"name\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE Student Conference on Research and Development (SCOReD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCOReD50371.2020.9251018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Student Conference on Research and Development (SCOReD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCOReD50371.2020.9251018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

语音机器人和聊天机器人应用程序开发的最新趋势使语音到文本(STT)和文本到语音(TTS)生成技术得以利用。为了开发这样的TTS或STT引擎，用于训练、验证和测试的音频文件中的文本和相应的录制语音必须保持一致。这是为了确保开发的发动机达到预期的转换质量。为了对齐语音和文本，应该使用音频对齐工具。在这些工具中，通常使用起始检测算法来标记音频文件的语音开始和结束时间。然后将此信息与文件的副本一起存储。在这项工作中，提供了一个开放的非零起始越南语语音数据集。该数据集包含348个音频文件，这些音频文件是从越南FPT公司2018年公开发布的2.5万多条(约30小时)越南语语音记录中筛选出来的。对于一种典型的发病检测算法的研究来说，这样的标记数据量是绰绰有余的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Identification of FOSD-based Non-zero Onset Speech Dataset

Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE Student Conference on Research and Development (SCOReD)

自引率

0.00%

发文量