超越邻接对:人机对话长序列的层次聚类

Proceedings of the First Workshop on Computational Approaches to Discourse Pub Date : 1900-01-01 DOI:10.18653/v1/2020.codi-1.2

M. Maitreyee

{"title":"超越邻接对:人机对话长序列的层次聚类","authors":"M. Maitreyee","doi":"10.18653/v1/2020.codi-1.2","DOIUrl":null,"url":null,"abstract":"This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word-embeddings and syntactic features, significantly improved the results.","PeriodicalId":332037,"journal":{"name":"Proceedings of the First Workshop on Computational Approaches to Discourse","volume":"82 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues\",\"authors\":\"M. Maitreyee\",\"doi\":\"10.18653/v1/2020.codi-1.2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word-embeddings and syntactic features, significantly improved the results.\",\"PeriodicalId\":332037,\"journal\":{\"name\":\"Proceedings of the First Workshop on Computational Approaches to Discourse\",\"volume\":\"82 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the First Workshop on Computational Approaches to Discourse\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.codi-1.2\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Computational Approaches to Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.codi-1.2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

这项工作提出了一个框架来预测对话序列，使用基于回合的句法特征和对话控制功能。使用依赖解析提取语法特征，同时手动标记对话控制函数。利用tf-idf和词嵌入对这些特征进行转换;使用主成分分析(PCA)进行特征选择。我们对六种特征组合进行了实验，用层次聚集聚类预测序列。对聚类结果的分析表明，使用词嵌入和句法特征可以显著改善聚类结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues

This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word-embeddings and syntactic features, significantly improved the results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the First Workshop on Computational Approaches to Discourse

自引率

0.00%

发文量