基于声学单元的自下而上无监督词发现

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP) Pub Date : 2019-11-01 DOI:10.1109/GlobalSIP45357.2019.8969225

Saurabhchand Bhati, Chunxi Liu, J. Villalba, J. Trmal, S. Khudanpur, N. Dehak

{"title":"基于声学单元的自下而上无监督词发现","authors":"Saurabhchand Bhati, Chunxi Liu, J. Villalba, J. Trmal, S. Khudanpur, N. Dehak","doi":"10.1109/GlobalSIP45357.2019.8969225","DOIUrl":null,"url":null,"abstract":"Unsupervised term discovery is the task of identifying and grouping reoccurring word-like patterns from the untranscribed audio data. It facilitates unsupervised acoustic model training in zero resource setting where no or minimal transcribed speech is available. In this paper, we investigate two-step bottom-up approaches for unsupervised discovery of word-like units. The first step discovers phone-like acoustic units from data and the second step combines the basic acoustic blocks to identify word-like units. We investigated Embedded Segmental K-means and Nested Hierarchical Pitman-Yor (PYR) model as bottom-up strategies. ESK-Means iteratively selects boundaries from an initial set to arrive at the word boundaries. The final performance critically depends on the quality of the initial boundaries. We used a segmentation method that discovers boundaries much closer to actual boundaries. PYR model has been used for word segmentation from space removed text data, and here we use it for word discovery from unsupervised acoustic units. The term discovery performance is evaluated on the Zero Resource 2017 challenge dataset, which consists of around 70 hours of unlabelled data. Our systems outperformed the baseline systems on all the languages without language-specific parameter tuning. We performed comprehensive experiments of the system parameters on the system performance.","PeriodicalId":221378,"journal":{"name":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bottom-Up Unsupervised Word Discovery via Acoustic Units\",\"authors\":\"Saurabhchand Bhati, Chunxi Liu, J. Villalba, J. Trmal, S. Khudanpur, N. Dehak\",\"doi\":\"10.1109/GlobalSIP45357.2019.8969225\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unsupervised term discovery is the task of identifying and grouping reoccurring word-like patterns from the untranscribed audio data. It facilitates unsupervised acoustic model training in zero resource setting where no or minimal transcribed speech is available. In this paper, we investigate two-step bottom-up approaches for unsupervised discovery of word-like units. The first step discovers phone-like acoustic units from data and the second step combines the basic acoustic blocks to identify word-like units. We investigated Embedded Segmental K-means and Nested Hierarchical Pitman-Yor (PYR) model as bottom-up strategies. ESK-Means iteratively selects boundaries from an initial set to arrive at the word boundaries. The final performance critically depends on the quality of the initial boundaries. We used a segmentation method that discovers boundaries much closer to actual boundaries. PYR model has been used for word segmentation from space removed text data, and here we use it for word discovery from unsupervised acoustic units. The term discovery performance is evaluated on the Zero Resource 2017 challenge dataset, which consists of around 70 hours of unlabelled data. Our systems outperformed the baseline systems on all the languages without language-specific parameter tuning. We performed comprehensive experiments of the system parameters on the system performance.\",\"PeriodicalId\":221378,\"journal\":{\"name\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP45357.2019.8969225\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP45357.2019.8969225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

无监督术语发现是从未转录的音频数据中识别和分组重复出现的类词模式的任务。它促进了无监督的声学模型训练在零资源设置，其中没有或最小的转录语音可用。在本文中，我们研究了两步自底向上的无监督发现类词单元的方法。第一步从数据中发现类似电话的声学单元，第二步结合基本声学块来识别类似单词的单元。我们研究了嵌入式分段K-means和嵌套分层Pitman-Yor (PYR)模型作为自下而上的策略。ESK-Means迭代地从初始集合中选择边界以到达单词边界。最终的性能主要取决于初始边界的质量。我们使用了一种分割方法来发现更接近实际边界的边界。PYR模型已经用于从去除空间的文本数据中进行分词，这里我们将其用于从无监督声学单元中发现单词。术语发现性能在Zero Resource 2017挑战数据集上进行评估，该数据集由大约70小时的未标记数据组成。我们的系统在没有特定于语言的参数调优的情况下，在所有语言上都优于基线系统。我们对系统参数对系统性能的影响进行了综合实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bottom-Up Unsupervised Word Discovery via Acoustic Units

Unsupervised term discovery is the task of identifying and grouping reoccurring word-like patterns from the untranscribed audio data. It facilitates unsupervised acoustic model training in zero resource setting where no or minimal transcribed speech is available. In this paper, we investigate two-step bottom-up approaches for unsupervised discovery of word-like units. The first step discovers phone-like acoustic units from data and the second step combines the basic acoustic blocks to identify word-like units. We investigated Embedded Segmental K-means and Nested Hierarchical Pitman-Yor (PYR) model as bottom-up strategies. ESK-Means iteratively selects boundaries from an initial set to arrive at the word boundaries. The final performance critically depends on the quality of the initial boundaries. We used a segmentation method that discovers boundaries much closer to actual boundaries. PYR model has been used for word segmentation from space removed text data, and here we use it for word discovery from unsupervised acoustic units. The term discovery performance is evaluated on the Zero Resource 2017 challenge dataset, which consists of around 70 hours of unlabelled data. Our systems outperformed the baseline systems on all the languages without language-specific parameter tuning. We performed comprehensive experiments of the system parameters on the system performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

自引率

0.00%

发文量