基于PLSA的无监督语境关键词相关学习与测量

2006 Annual IEEE India Conference Pub Date : 2006-09-01 DOI:10.1109/INDCON.2006.302787

S. Sudarsun, Dalou Kalaivendhan, M. Venkateswarlu

{"title":"基于PLSA的无监督语境关键词相关学习与测量","authors":"S. Sudarsun, Dalou Kalaivendhan, M. Venkateswarlu","doi":"10.1109/INDCON.2006.302787","DOIUrl":null,"url":null,"abstract":"In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification","PeriodicalId":122715,"journal":{"name":"2006 Annual IEEE India Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Unsupervised Contextual Keyword Relevance Learning and Measurement using PLSA\",\"authors\":\"S. Sudarsun, Dalou Kalaivendhan, M. Venkateswarlu\",\"doi\":\"10.1109/INDCON.2006.302787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification\",\"PeriodicalId\":122715,\"journal\":{\"name\":\"2006 Annual IEEE India Conference\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 Annual IEEE India Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDCON.2006.302787\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Annual IEEE India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDCON.2006.302787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们开发了一种基于训练文本语料库中关键字分布的概率方法，使用PLSA来发现和分析上下文关键字相关性。我们已经通过实验证明，这种方法在根据上下文将关键字分类到不同领域方面具有灵活性。我们已经开发了一个原型系统，使我们能够在加载的PLSA模型上投影关键字查询，并返回密切相关的关键字。在降维空间中使用PLSA模型对关键词查询进行矢量化，并通过计算点积来推导相关性。我们还讨论了控制PLSA性能的参数，包括a)方面数量，b) EM迭代次数c) TDM上的加权函数(预加权)。我们通过计算查准率和查全率来估计质量。我们介绍了PLSA在文档分类中的应用实验

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised Contextual Keyword Relevance Learning and Measurement using PLSA

In this paper, we have developed a probabilistic approach using PLSA for the discovery and analysis of contextual keyword relevance based on the distribution of keywords across a training text corpus. We have shown experimentally, the flexibility of this approach in classifying keywords into different domains based on their context. We have developed a prototype system that allows us to project keyword queries on the loaded PLSA model and returns keywords that are closely correlated. The keyword query is vectorized using the PLSA model in the reduce aspect space and correlation is derived by calculating a dot product. We also discuss the parameters that control PLSA performance including a) number of aspects, b) number of EM iterations c) weighting functions on TDM (pre-weighting). We have estimated the quality through computation of precision-recall scores. We have presented our experiments on PLSA application towards document classification

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 Annual IEEE India Conference

自引率

0.00%

发文量