{"title":"Analysis of Hot News Based on Big Data","authors":"Chengcheng Hu, Y. Li, Yongbin Wang, Lin Wu","doi":"10.1109/ICIS.2018.8466427","DOIUrl":null,"url":null,"abstract":"To analyze hot news data of a culture experimental area in China, web crawler, text extraction, named entity recognition, word cloud and other technologies are be used in the paper. The news texts are obtained by using Berkeley DB, Scrapy frame and web page text extraction algorithm firstly. The total number of crawled news articles is 6.87 million. Then based on these data, the analysis of news attention and statistics of hot news to the experimental area are conducted by using NLTK's NER technology and Weka tools. And the relevant industry to the experimental were also analyzed. The visual representation of analysis results also is provided in this paper.","PeriodicalId":447019,"journal":{"name":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","volume":"143 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2018.8466427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
To analyze hot news data of a culture experimental area in China, web crawler, text extraction, named entity recognition, word cloud and other technologies are be used in the paper. The news texts are obtained by using Berkeley DB, Scrapy frame and web page text extraction algorithm firstly. The total number of crawled news articles is 6.87 million. Then based on these data, the analysis of news attention and statistics of hot news to the experimental area are conducted by using NLTK's NER technology and Weka tools. And the relevant industry to the experimental were also analyzed. The visual representation of analysis results also is provided in this paper.