{"title":"Automatic Topic Modeling for Single Document Short Texts","authors":"Anamta Sajid, S. Jan, I. Shah","doi":"10.1109/FIT.2017.00020","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach to automate the process of extracting topic and main title from a single-document short text. The proposed approach uses online text mining and Natural Language Processing techniques. The title of any text provides an efficient way to concisely grasp the overview of the contents in the text by giving a glance on its main heading only, which is quicker than reading the summary. In this paper, three different mechanisms have been proposed, implemented and compared to find the best approach for automatic extraction of a topic that is more relevant to the overall event explained in the text. The proposed system is evaluated against fifteen news articles from New York Times. The significance of the paper is twofold: Firstly, these automatic topic extraction techniques can be used further for document classification, document relevancy and similarity, summarization, comprehensive grasp of any event and finding novelty in out sized and scattered text data by scanning titles. Secondly, it can be used as a road map for the new researchers by using this detailed analysis of various data mining techniques. The experimental results show that the Nouns are more related, reliable, and suitable words for finding the topic of the text.","PeriodicalId":107273,"journal":{"name":"2017 International Conference on Frontiers of Information Technology (FIT)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Frontiers of Information Technology (FIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FIT.2017.00020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper presents a novel approach to automate the process of extracting topic and main title from a single-document short text. The proposed approach uses online text mining and Natural Language Processing techniques. The title of any text provides an efficient way to concisely grasp the overview of the contents in the text by giving a glance on its main heading only, which is quicker than reading the summary. In this paper, three different mechanisms have been proposed, implemented and compared to find the best approach for automatic extraction of a topic that is more relevant to the overall event explained in the text. The proposed system is evaluated against fifteen news articles from New York Times. The significance of the paper is twofold: Firstly, these automatic topic extraction techniques can be used further for document classification, document relevancy and similarity, summarization, comprehensive grasp of any event and finding novelty in out sized and scattered text data by scanning titles. Secondly, it can be used as a road map for the new researchers by using this detailed analysis of various data mining techniques. The experimental results show that the Nouns are more related, reliable, and suitable words for finding the topic of the text.