基于混合语言的非结构化文本主题检测

2018 International Conference on Information Technology (ICIT) Pub Date : 2018-12-01 DOI:10.1109/ICIT.2018.00040

Suraj Sharma, Sabitra Sankalp Panigrahi, Biswajit Paul, N. Panigrahi

{"title":"基于混合语言的非结构化文本主题检测","authors":"Suraj Sharma, Sabitra Sankalp Panigrahi, Biswajit Paul, N. Panigrahi","doi":"10.1109/ICIT.2018.00040","DOIUrl":null,"url":null,"abstract":"This paper proposes a design of a Topic Detector machine which combines the power of LDA and Word2Vec to detect topic from mixed text. The experiment is carried on a mixed text of English and Hindi to detect topics. The technique tokenizes the mixed text of Hindi and English and models them into feature vector trough a process of Word2Vec. These vectors are clustered and the cluster centers are identified as the topic of the cluster of tokens","PeriodicalId":221269,"journal":{"name":"2018 International Conference on Information Technology (ICIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detection of Topic from Unstructured Text With Mixed Languages\",\"authors\":\"Suraj Sharma, Sabitra Sankalp Panigrahi, Biswajit Paul, N. Panigrahi\",\"doi\":\"10.1109/ICIT.2018.00040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a design of a Topic Detector machine which combines the power of LDA and Word2Vec to detect topic from mixed text. The experiment is carried on a mixed text of English and Hindi to detect topics. The technique tokenizes the mixed text of Hindi and English and models them into feature vector trough a process of Word2Vec. These vectors are clustered and the cluster centers are identified as the topic of the cluster of tokens\",\"PeriodicalId\":221269,\"journal\":{\"name\":\"2018 International Conference on Information Technology (ICIT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Information Technology (ICIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIT.2018.00040\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Information Technology (ICIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIT.2018.00040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文提出了一种结合LDA和Word2Vec的能力从混合文本中检测主题的主题检测器的设计。实验在英语和印地语混合文本中进行主题检测。该技术对印地语和英语混合文本进行标记，并通过Word2Vec过程将其建模为特征向量。将这些向量聚类，并将聚类中心确定为令牌聚类的主题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Detection of Topic from Unstructured Text With Mixed Languages

This paper proposes a design of a Topic Detector machine which combines the power of LDA and Word2Vec to detect topic from mixed text. The experiment is carried on a mixed text of English and Hindi to detect topics. The technique tokenizes the mixed text of Hindi and English and models them into feature vector trough a process of Word2Vec. These vectors are clustered and the cluster centers are identified as the topic of the cluster of tokens

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 International Conference on Information Technology (ICIT)

自引率

0.00%

发文量