{"title":"The Data Crawling and Hotspot Analyze of Social Q&A Site","authors":"Rui-Hui Jia","doi":"10.1109/ICNISC.2017.00058","DOIUrl":null,"url":null,"abstract":"Along with the rapid development of the Internet, more specialized and detailed information sources like Q&A sites have gradually come into being. On these social Q&A platforms, there are plenty of hot topics and news being discussed and even created every minute. Therefore, it is of great practical significance to learn about hot social issues by analyzing and parsing the content on social Q&A platforms. By taking a social Q&A platform as the research subject, this paper analyzes the difficulties in crawling data from this platform and relevant solutions, designs and implement a data crawling system containing a user information storage module, a highly anonymous and available proxy maintenance module, a node crawling and parsing module, and a data storage module. With these modules, the system is able to crawl data and store it without being restricted by the platform. On this basis, this paper designs and implements a hotspot parsing and grading module. Based on echarts, a historical hotspot display module and a trending hotspot display module are designed to show the historical and trending hotspots on this platform. Then, this paper uses the proposed data crawling module and the hotspot analysis and display system to obtain the data of 31,520 regularized independent topics and the real-time data of 979,815 questions from this social Q&A platform. Based on these data, the historical and trending hotspot analysis on this platform is displayed. The experimental results show that this system has fully met the design objectives. Finally, this research summarizes the proposed data crawling and hotspot analysis system and provides reference and directions for future work.","PeriodicalId":429511,"journal":{"name":"2017 International Conference on Network and Information Systems for Computers (ICNISC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Network and Information Systems for Computers (ICNISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNISC.2017.00058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Along with the rapid development of the Internet, more specialized and detailed information sources like Q&A sites have gradually come into being. On these social Q&A platforms, there are plenty of hot topics and news being discussed and even created every minute. Therefore, it is of great practical significance to learn about hot social issues by analyzing and parsing the content on social Q&A platforms. By taking a social Q&A platform as the research subject, this paper analyzes the difficulties in crawling data from this platform and relevant solutions, designs and implement a data crawling system containing a user information storage module, a highly anonymous and available proxy maintenance module, a node crawling and parsing module, and a data storage module. With these modules, the system is able to crawl data and store it without being restricted by the platform. On this basis, this paper designs and implements a hotspot parsing and grading module. Based on echarts, a historical hotspot display module and a trending hotspot display module are designed to show the historical and trending hotspots on this platform. Then, this paper uses the proposed data crawling module and the hotspot analysis and display system to obtain the data of 31,520 regularized independent topics and the real-time data of 979,815 questions from this social Q&A platform. Based on these data, the historical and trending hotspot analysis on this platform is displayed. The experimental results show that this system has fully met the design objectives. Finally, this research summarizes the proposed data crawling and hotspot analysis system and provides reference and directions for future work.