{"title":"社交网络数据中的意见挖掘分类与预测","authors":"Shaimaa M. Mohamed, Mahmoud Hussien, A. Keshk","doi":"10.21608/ijci.2020.26841.1015","DOIUrl":null,"url":null,"abstract":"opinion mining in social networks data considers one of the most significant and challenging tasks in our days due to the huge number of information that distributed each day. We can profit from these opinions by utilizing two significant procedures (classification and prediction). Although there is many researchers’ work at this point, it still needs improvement. Therefore, in this paper, we present a method to improve the accuracy of both processes. The improvement is done through cleaning the data set by converting all words to lower case, removing usernames, mentions, links, repeated characters, numbers, delete more than two spaces between words, empty tweets, punctuations and stop words, and converting all words like “isn't” to “is not”. we using both unigrams and bigrams as features. Our data set contains the user's feelings about distributed products, tweets labeled positive or negative, and each product rate from one to five. We implemented this work using different supervised machine learning algorithms like Naive Bayes, Support Vector Machine and MaxEntropy for the classification process, and Random Forest Regression, Logistic Regression, and Support Vector Regression for the prediction process. At last, we have accuracy in both processes better than existing works. In classification, we achieved an accuracy of 90% and in the prediction process, Support Vector Regression model is able to predict future product rate with a Mean Squared Error (MSE) of 0.4122, Logistic Regression model is able to predict with MSE of 0.4986 and Random Forest Regression model able to predict with MSE of 0.4770.","PeriodicalId":137729,"journal":{"name":"IJCI. International Journal of Computers and Information","volume":"33 12","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Classification and Prediction of Opinion Mining in Social Networks Data\",\"authors\":\"Shaimaa M. Mohamed, Mahmoud Hussien, A. Keshk\",\"doi\":\"10.21608/ijci.2020.26841.1015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"opinion mining in social networks data considers one of the most significant and challenging tasks in our days due to the huge number of information that distributed each day. We can profit from these opinions by utilizing two significant procedures (classification and prediction). Although there is many researchers’ work at this point, it still needs improvement. Therefore, in this paper, we present a method to improve the accuracy of both processes. The improvement is done through cleaning the data set by converting all words to lower case, removing usernames, mentions, links, repeated characters, numbers, delete more than two spaces between words, empty tweets, punctuations and stop words, and converting all words like “isn't” to “is not”. we using both unigrams and bigrams as features. Our data set contains the user's feelings about distributed products, tweets labeled positive or negative, and each product rate from one to five. We implemented this work using different supervised machine learning algorithms like Naive Bayes, Support Vector Machine and MaxEntropy for the classification process, and Random Forest Regression, Logistic Regression, and Support Vector Regression for the prediction process. At last, we have accuracy in both processes better than existing works. In classification, we achieved an accuracy of 90% and in the prediction process, Support Vector Regression model is able to predict future product rate with a Mean Squared Error (MSE) of 0.4122, Logistic Regression model is able to predict with MSE of 0.4986 and Random Forest Regression model able to predict with MSE of 0.4770.\",\"PeriodicalId\":137729,\"journal\":{\"name\":\"IJCI. International Journal of Computers and Information\",\"volume\":\"33 12\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IJCI. International Journal of Computers and Information\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21608/ijci.2020.26841.1015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IJCI. International Journal of Computers and Information","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/ijci.2020.26841.1015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Classification and Prediction of Opinion Mining in Social Networks Data
opinion mining in social networks data considers one of the most significant and challenging tasks in our days due to the huge number of information that distributed each day. We can profit from these opinions by utilizing two significant procedures (classification and prediction). Although there is many researchers’ work at this point, it still needs improvement. Therefore, in this paper, we present a method to improve the accuracy of both processes. The improvement is done through cleaning the data set by converting all words to lower case, removing usernames, mentions, links, repeated characters, numbers, delete more than two spaces between words, empty tweets, punctuations and stop words, and converting all words like “isn't” to “is not”. we using both unigrams and bigrams as features. Our data set contains the user's feelings about distributed products, tweets labeled positive or negative, and each product rate from one to five. We implemented this work using different supervised machine learning algorithms like Naive Bayes, Support Vector Machine and MaxEntropy for the classification process, and Random Forest Regression, Logistic Regression, and Support Vector Regression for the prediction process. At last, we have accuracy in both processes better than existing works. In classification, we achieved an accuracy of 90% and in the prediction process, Support Vector Regression model is able to predict future product rate with a Mean Squared Error (MSE) of 0.4122, Logistic Regression model is able to predict with MSE of 0.4986 and Random Forest Regression model able to predict with MSE of 0.4770.