基于采样原型词的贝叶斯噪声词聚类

2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) Pub Date : 2018-09-01 DOI:10.1109/DEVLRN.2018.8760503

T. Taniguchi, Yuta Fukusako, Toshiaki Takano

{"title":"基于采样原型词的贝叶斯噪声词聚类","authors":"T. Taniguchi, Yuta Fukusako, Toshiaki Takano","doi":"10.1109/DEVLRN.2018.8760503","DOIUrl":null,"url":null,"abstract":"This paper describes a new algorithm for sampling prototypical words from a set of noisy words and proposes a noisy word clustering method. In a lexical acquisition task, phoneme sequences recognized by a developmental robot using a phoneme recognizer have many errors. A letter or phoneme sequence involving errors is called a noisy word. To develop a mixture model for noisy words and develop a clustering method, a procedure needs to be developed for the sampling of a prototypical word, i.e., “mean” string, in a cluster of noisy words. Despite a long history regarding methods for treating noisy words, e.g., a stochastic deformation model, the edit distance and their variants, and an efficient sampling procedure for prototypical words have not been developed. In this paper, the mixture of stochastic deformation models, namely a generative model for noisy words, is proposed, and efficient blocked Gibbs samplers for the model are proposed. To develop this procedure, a forward filtering backward sampling procedure is proposed for jointly decoding noisy words and sampling their “mean” string. We applied the proposed clustering method to a set of noisy synthetic words and obtained better results than a baseline method. In particular, a sampling procedure using tied backward sampling demonstrated the best performance in reconstructing original words from noisy words through a clustering process.","PeriodicalId":236346,"journal":{"name":"2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Bayesian Noisy Word Clustering through Sampling Prototypical Words\",\"authors\":\"T. Taniguchi, Yuta Fukusako, Toshiaki Takano\",\"doi\":\"10.1109/DEVLRN.2018.8760503\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a new algorithm for sampling prototypical words from a set of noisy words and proposes a noisy word clustering method. In a lexical acquisition task, phoneme sequences recognized by a developmental robot using a phoneme recognizer have many errors. A letter or phoneme sequence involving errors is called a noisy word. To develop a mixture model for noisy words and develop a clustering method, a procedure needs to be developed for the sampling of a prototypical word, i.e., “mean” string, in a cluster of noisy words. Despite a long history regarding methods for treating noisy words, e.g., a stochastic deformation model, the edit distance and their variants, and an efficient sampling procedure for prototypical words have not been developed. In this paper, the mixture of stochastic deformation models, namely a generative model for noisy words, is proposed, and efficient blocked Gibbs samplers for the model are proposed. To develop this procedure, a forward filtering backward sampling procedure is proposed for jointly decoding noisy words and sampling their “mean” string. We applied the proposed clustering method to a set of noisy synthetic words and obtained better results than a baseline method. In particular, a sampling procedure using tied backward sampling demonstrated the best performance in reconstructing original words from noisy words through a clustering process.\",\"PeriodicalId\":236346,\"journal\":{\"name\":\"2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2018.8760503\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2018.8760503","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种从一组噪声词中抽取原型词的新算法，并提出了一种噪声词聚类方法。在词汇习得任务中，发展型机器人使用音素识别器识别音素序列存在许多错误。包含错误的字母或音素序列称为杂音词。为了开发噪声词的混合模型并开发聚类方法，需要开发一个程序，用于在噪声词簇中采样一个原型词，即“mean”字符串。尽管处理噪声词的方法有很长的历史，例如随机变形模型，编辑距离及其变体，以及原型词的有效采样程序尚未开发。本文提出了混合随机变形模型，即噪声词的生成模型，并为该模型提出了有效的阻塞吉布斯采样器。为了发展这一程序，提出了一种前向滤波后向采样程序，用于联合解码有噪声的单词并对其“平均”字符串进行采样。我们将所提出的聚类方法应用于一组有噪声的合成词，得到了比基线方法更好的聚类结果。特别是，使用捆绑向后采样的采样过程在通过聚类过程从噪声单词中重建原始单词方面表现出最佳性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Bayesian Noisy Word Clustering through Sampling Prototypical Words

This paper describes a new algorithm for sampling prototypical words from a set of noisy words and proposes a noisy word clustering method. In a lexical acquisition task, phoneme sequences recognized by a developmental robot using a phoneme recognizer have many errors. A letter or phoneme sequence involving errors is called a noisy word. To develop a mixture model for noisy words and develop a clustering method, a procedure needs to be developed for the sampling of a prototypical word, i.e., “mean” string, in a cluster of noisy words. Despite a long history regarding methods for treating noisy words, e.g., a stochastic deformation model, the edit distance and their variants, and an efficient sampling procedure for prototypical words have not been developed. In this paper, the mixture of stochastic deformation models, namely a generative model for noisy words, is proposed, and efficient blocked Gibbs samplers for the model are proposed. To develop this procedure, a forward filtering backward sampling procedure is proposed for jointly decoding noisy words and sampling their “mean” string. We applied the proposed clustering method to a set of noisy synthetic words and obtained better results than a baseline method. In particular, a sampling procedure using tied backward sampling demonstrated the best performance in reconstructing original words from noisy words through a clustering process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 Joint IEEE 8th International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob)

自引率

0.00%

发文量