{"title":"Detection of topics from newspaper and its analysis of temporal variations in regions","authors":"Taizo Yamada","doi":"10.23919/PNC.2017.8203520","DOIUrl":null,"url":null,"abstract":"In the paper, we introduce a method of topic detection using topic model for Japanese newspaper and propose to visualize the time change of the detected topics. In the study, we detected topics from newspapers published by Mainichi Newspapers from 2010 to 2015. There are about six hundred articles (number of characters: about 300 million) in the text data. We performed to extracts nouns as characteristic words of the text. We characterized the text with a latent topic which is hidden in the text and can be detected by LDA (Latent Dirichlet Allocation) which is one of a topic model. There are very diverse topics including politics, sports, lotteries, Southeast Asian affairs, Japanese economics, academics, and so on. From them, we noticed earthquake topics and focused on them. In order to grasp the characteristics of the topic, we visualized the change of the frequency of the occurrence and the top words on a monthly basis. In order to calculate similarity between topics, we used cosine similarity in which the frequency of word occurrence per topic was used. Analyzing topics by region helps you to grasp the situation fluctuation in the region. If we investigate the posting position of the article and the topic variation, we can find out the importance of the topic or the article at that time.","PeriodicalId":325096,"journal":{"name":"2017 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/PNC.2017.8203520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In the paper, we introduce a method of topic detection using topic model for Japanese newspaper and propose to visualize the time change of the detected topics. In the study, we detected topics from newspapers published by Mainichi Newspapers from 2010 to 2015. There are about six hundred articles (number of characters: about 300 million) in the text data. We performed to extracts nouns as characteristic words of the text. We characterized the text with a latent topic which is hidden in the text and can be detected by LDA (Latent Dirichlet Allocation) which is one of a topic model. There are very diverse topics including politics, sports, lotteries, Southeast Asian affairs, Japanese economics, academics, and so on. From them, we noticed earthquake topics and focused on them. In order to grasp the characteristics of the topic, we visualized the change of the frequency of the occurrence and the top words on a monthly basis. In order to calculate similarity between topics, we used cosine similarity in which the frequency of word occurrence per topic was used. Analyzing topics by region helps you to grasp the situation fluctuation in the region. If we investigate the posting position of the article and the topic variation, we can find out the importance of the topic or the article at that time.