{"title":"观点挖掘:特征工程仍然相关吗?","authors":"Md. Ataur Rahman, Puja Chakraborty","doi":"10.1109/ICICT4SD50815.2021.9396874","DOIUrl":null,"url":null,"abstract":"This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.","PeriodicalId":239251,"journal":{"name":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","volume":"7 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Opinion Mining: Is Feature Engineering Still Relevant?\",\"authors\":\"Md. Ataur Rahman, Puja Chakraborty\",\"doi\":\"10.1109/ICICT4SD50815.2021.9396874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.\",\"PeriodicalId\":239251,\"journal\":{\"name\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"volume\":\"7 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICICT4SD50815.2021.9396874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICT4SD50815.2021.9396874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Opinion Mining: Is Feature Engineering Still Relevant?
This paper manifests the experimentation with sentiment polarity detection over Stanford's IMDB movie review dataset using a Support Vector Machine classifier (SVM). Our prime motivation was to find out the best possible combinations of classic features and preprocessing techniques for the classification of positive and negative opinions. We also explored two variants of kernels with numerous parameter settings for the classifier in the hope of getting the best SVM model. Our best model achieved an accuracy score of 85.45%. The results indicate that a model with a non-linear Radial Basis Function (RBF) kernel leads to the highest accuracy. The features that contributed the most are stemmed word n-grams.