基于潜在狄利克雷分配的会话对话记录主题模型分配

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific Pub Date : 2014-12-01 DOI:10.1109/APSIPA.2014.7041546

Jui-Feng Yeh, C. Lee, Yi-Shiuan Tan, Liang-Chih Yu

{"title":"基于潜在狄利克雷分配的会话对话记录主题模型分配","authors":"Jui-Feng Yeh, C. Lee, Yi-Shiuan Tan, Liang-Chih Yu","doi":"10.1109/APSIPA.2014.7041546","DOIUrl":null,"url":null,"abstract":"The topic information of conversational content is important for continuation with communication, so topic detection and tracking is one of important research. Due to there are many topic transform occurring frequently in long time communication, and the conversation maybe have many topics, so it's important to detect different topics in conversational content. This paper detects topic information by using agglomerative clustering of utterances and Dynamic Latent Dirichlet Allocation topic model, uses proportion of verb and noun to analyze similarity between utterances and cluster all utterances in conversational content by agglomerative clustering algorithm. The topic structure of conversational content is friability, so we use speech act information and gets the hypernym information by E-HowNet that obtains robustness of word categories. Latent Dirichlet Allocation topic model is used to detect topic in file units, it just can detect only one topic if uses it in conversational content, because of there are many topics in conversational content frequently, and also uses speech act information and hypernym information to train the latent Dirichlet allocation models, then uses trained models to detect different topic information in conversational content. For evaluating the proposed method, support vector machine is developed for comparison. According to the experimental results, we can find the proposed method outperforms the approach based on support vector machine in topic detection and tracking in spoken dialogue.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":"243 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Topic model allocation of conversational dialogue records by Latent Dirichlet Allocation\",\"authors\":\"Jui-Feng Yeh, C. Lee, Yi-Shiuan Tan, Liang-Chih Yu\",\"doi\":\"10.1109/APSIPA.2014.7041546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The topic information of conversational content is important for continuation with communication, so topic detection and tracking is one of important research. Due to there are many topic transform occurring frequently in long time communication, and the conversation maybe have many topics, so it's important to detect different topics in conversational content. This paper detects topic information by using agglomerative clustering of utterances and Dynamic Latent Dirichlet Allocation topic model, uses proportion of verb and noun to analyze similarity between utterances and cluster all utterances in conversational content by agglomerative clustering algorithm. The topic structure of conversational content is friability, so we use speech act information and gets the hypernym information by E-HowNet that obtains robustness of word categories. Latent Dirichlet Allocation topic model is used to detect topic in file units, it just can detect only one topic if uses it in conversational content, because of there are many topics in conversational content frequently, and also uses speech act information and hypernym information to train the latent Dirichlet allocation models, then uses trained models to detect different topic information in conversational content. For evaluating the proposed method, support vector machine is developed for comparison. According to the experimental results, we can find the proposed method outperforms the approach based on support vector machine in topic detection and tracking in spoken dialogue.\",\"PeriodicalId\":231382,\"journal\":{\"name\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"volume\":\"243 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSIPA.2014.7041546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

会话内容的话题信息对于交流的继续是很重要的，因此话题的检测与跟踪是重要的研究之一。由于在长时间的交际中，经常会出现许多话题变换，并且会话可能包含许多话题，因此在会话内容中检测不同的话题是很重要的。本文利用话语凝聚聚类和动态潜狄利克雷分配主题模型检测话题信息，利用动词和名词的比例分析话语之间的相似度，并利用凝聚聚类算法对会话内容中的所有话语进行聚类。会话内容的主题结构是脆弱的，因此我们利用语音行为信息，通过E-HowNet获取首词信息，从而获得词类别的鲁棒性。潜狄利克雷分配主题模型用于在文件单元中检测主题，由于会话内容中经常有许多主题，因此在会话内容中使用潜狄利克雷分配主题模型只能检测一个主题，并且还使用语音行为信息和超词信息来训练潜狄利克雷分配模型，然后使用训练好的模型来检测会话内容中的不同主题信息。为了评估所提出的方法，开发了支持向量机进行比较。根据实验结果，我们发现该方法在口语对话的主题检测和跟踪方面优于基于支持向量机的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Topic model allocation of conversational dialogue records by Latent Dirichlet Allocation

The topic information of conversational content is important for continuation with communication, so topic detection and tracking is one of important research. Due to there are many topic transform occurring frequently in long time communication, and the conversation maybe have many topics, so it's important to detect different topics in conversational content. This paper detects topic information by using agglomerative clustering of utterances and Dynamic Latent Dirichlet Allocation topic model, uses proportion of verb and noun to analyze similarity between utterances and cluster all utterances in conversational content by agglomerative clustering algorithm. The topic structure of conversational content is friability, so we use speech act information and gets the hypernym information by E-HowNet that obtains robustness of word categories. Latent Dirichlet Allocation topic model is used to detect topic in file units, it just can detect only one topic if uses it in conversational content, because of there are many topics in conversational content frequently, and also uses speech act information and hypernym information to train the latent Dirichlet allocation models, then uses trained models to detect different topic information in conversational content. For evaluating the proposed method, support vector machine is developed for comparison. According to the experimental results, we can find the proposed method outperforms the approach based on support vector machine in topic detection and tracking in spoken dialogue.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific

自引率

0.00%

发文量