Detection of Topic from Unstructured Text With Mixed Languages

2018 International Conference on Information Technology (ICIT) Pub Date : 2018-12-01 DOI:10.1109/ICIT.2018.00040

Suraj Sharma, Sabitra Sankalp Panigrahi, Biswajit Paul, N. Panigrahi

引用次数: 1

Abstract

This paper proposes a design of a Topic Detector machine which combines the power of LDA and Word2Vec to detect topic from mixed text. The experiment is carried on a mixed text of English and Hindi to detect topics. The technique tokenizes the mixed text of Hindi and English and models them into feature vector trough a process of Word2Vec. These vectors are clustered and the cluster centers are identified as the topic of the cluster of tokens

查看原文本刊更多论文

基于混合语言的非结构化文本主题检测

本文提出了一种结合LDA和Word2Vec的能力从混合文本中检测主题的主题检测器的设计。实验在英语和印地语混合文本中进行主题检测。该技术对印地语和英语混合文本进行标记，并通过Word2Vec过程将其建模为特征向量。将这些向量聚类，并将聚类中心确定为令牌聚类的主题

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Conference on Information Technology (ICIT)

自引率

0.00%

发文量