{"title":"Research and Realization of Internet Public Opinion Analysis Based on Improved TF - IDF Algorithm","authors":"Yanxia Yang","doi":"10.1109/DCABES.2017.24","DOIUrl":null,"url":null,"abstract":"At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.","PeriodicalId":446641,"journal":{"name":"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCABES.2017.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
Abstract
At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.