{"title":"基于Spark和机器学习的中国社交媒体数据医学分析","authors":"Siqi Peng","doi":"10.1109/ICCSMT54525.2021.00083","DOIUrl":null,"url":null,"abstract":"Social media embracing a huge amount of real-time data of all kinds plays an important role in data analysis in the era of big data. Knowledge between medical workers and ordinary people can be popularized and exchanged via social media. At the same time, the collection and utilization of medical data on social media can effectively grasp the public health situation and provide better help to improve people's health status. From the perspective of medical care and health, this paper uses Weibo, the largest public social media in China, to obtain data for analysis. The study was developed under the Spark framework, using naive Bayes, random forest and two different feature extraction methods to clean, pre-process and classify data. Furthermore, the accuracy rate and F1 Score were used to evaluate the model, to find the most appropriate method. The result of this research shows that the data obtained from Weibo within certain age groups has a good reference value in the public awareness and current situation, and are good for grasping the trend of diseases.","PeriodicalId":304337,"journal":{"name":"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Medical Analysis of Social Media Data Based on Spark and Machine Learning in China\",\"authors\":\"Siqi Peng\",\"doi\":\"10.1109/ICCSMT54525.2021.00083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media embracing a huge amount of real-time data of all kinds plays an important role in data analysis in the era of big data. Knowledge between medical workers and ordinary people can be popularized and exchanged via social media. At the same time, the collection and utilization of medical data on social media can effectively grasp the public health situation and provide better help to improve people's health status. From the perspective of medical care and health, this paper uses Weibo, the largest public social media in China, to obtain data for analysis. The study was developed under the Spark framework, using naive Bayes, random forest and two different feature extraction methods to clean, pre-process and classify data. Furthermore, the accuracy rate and F1 Score were used to evaluate the model, to find the most appropriate method. The result of this research shows that the data obtained from Weibo within certain age groups has a good reference value in the public awareness and current situation, and are good for grasping the trend of diseases.\",\"PeriodicalId\":304337,\"journal\":{\"name\":\"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCSMT54525.2021.00083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd International Conference on Computer Science and Management Technology (ICCSMT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSMT54525.2021.00083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Medical Analysis of Social Media Data Based on Spark and Machine Learning in China
Social media embracing a huge amount of real-time data of all kinds plays an important role in data analysis in the era of big data. Knowledge between medical workers and ordinary people can be popularized and exchanged via social media. At the same time, the collection and utilization of medical data on social media can effectively grasp the public health situation and provide better help to improve people's health status. From the perspective of medical care and health, this paper uses Weibo, the largest public social media in China, to obtain data for analysis. The study was developed under the Spark framework, using naive Bayes, random forest and two different feature extraction methods to clean, pre-process and classify data. Furthermore, the accuracy rate and F1 Score were used to evaluate the model, to find the most appropriate method. The result of this research shows that the data obtained from Weibo within certain age groups has a good reference value in the public awareness and current situation, and are good for grasping the trend of diseases.