{"title":"基于改进TF - IDF算法的网络舆情分析研究与实现","authors":"Yanxia Yang","doi":"10.1109/DCABES.2017.24","DOIUrl":null,"url":null,"abstract":"At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.","PeriodicalId":446641,"journal":{"name":"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Research and Realization of Internet Public Opinion Analysis Based on Improved TF - IDF Algorithm\",\"authors\":\"Yanxia Yang\",\"doi\":\"10.1109/DCABES.2017.24\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.\",\"PeriodicalId\":446641,\"journal\":{\"name\":\"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DCABES.2017.24\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 16th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DCABES.2017.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research and Realization of Internet Public Opinion Analysis Based on Improved TF - IDF Algorithm
At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.