{"title":"基于候选人资料的地区选举结果预测模型","authors":"Muhammad Fachrie, Farida Ardiani","doi":"10.23919/eecsi53397.2021.9624256","DOIUrl":null,"url":null,"abstract":"User-generated contents from Twitter have been utilized to do sentiment analysis for predicting the presidential election result. Researchers successfully proposed methods based on Text Mining and Machine Learning approach to create sentiment analysis model as basis for prediction. However, Twitter-based prediction is difficult to be utilized in regional election, as massive tweets usually posted regarding elections held in provinces, cities, or large districts only. Moreover, Twitter-based prediction must deal with unstructured data, fake/ bot account, wrong information, mixed of languages, nonstandard writing style, and even subjectivity when labeling the dataset. Therefore, this work proposed an alternative prediction model for regional election result based on candidate's profile which is officially published by General Election Commission of the Republic of Indonesia. There are four main tasks in this work, i.e., data collection, data preprocessing, feature engineering, and data classification using C4.5 decision tree algorithm. As the result, the predictive model achieved accuracy of 72.96% after doing post and pre-prunning procedures. This work also contributes to generating a new dataset for predicting the result of regional election in Indonesia which contains related features that affect the winning of candidates.","PeriodicalId":259450,"journal":{"name":"2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)","volume":"244 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Predictive Model for Regional Elections Results based on Candidate Profiles\",\"authors\":\"Muhammad Fachrie, Farida Ardiani\",\"doi\":\"10.23919/eecsi53397.2021.9624256\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"User-generated contents from Twitter have been utilized to do sentiment analysis for predicting the presidential election result. Researchers successfully proposed methods based on Text Mining and Machine Learning approach to create sentiment analysis model as basis for prediction. However, Twitter-based prediction is difficult to be utilized in regional election, as massive tweets usually posted regarding elections held in provinces, cities, or large districts only. Moreover, Twitter-based prediction must deal with unstructured data, fake/ bot account, wrong information, mixed of languages, nonstandard writing style, and even subjectivity when labeling the dataset. Therefore, this work proposed an alternative prediction model for regional election result based on candidate's profile which is officially published by General Election Commission of the Republic of Indonesia. There are four main tasks in this work, i.e., data collection, data preprocessing, feature engineering, and data classification using C4.5 decision tree algorithm. As the result, the predictive model achieved accuracy of 72.96% after doing post and pre-prunning procedures. This work also contributes to generating a new dataset for predicting the result of regional election in Indonesia which contains related features that affect the winning of candidates.\",\"PeriodicalId\":259450,\"journal\":{\"name\":\"2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)\",\"volume\":\"244 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eecsi53397.2021.9624256\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eecsi53397.2021.9624256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Predictive Model for Regional Elections Results based on Candidate Profiles
User-generated contents from Twitter have been utilized to do sentiment analysis for predicting the presidential election result. Researchers successfully proposed methods based on Text Mining and Machine Learning approach to create sentiment analysis model as basis for prediction. However, Twitter-based prediction is difficult to be utilized in regional election, as massive tweets usually posted regarding elections held in provinces, cities, or large districts only. Moreover, Twitter-based prediction must deal with unstructured data, fake/ bot account, wrong information, mixed of languages, nonstandard writing style, and even subjectivity when labeling the dataset. Therefore, this work proposed an alternative prediction model for regional election result based on candidate's profile which is officially published by General Election Commission of the Republic of Indonesia. There are four main tasks in this work, i.e., data collection, data preprocessing, feature engineering, and data classification using C4.5 decision tree algorithm. As the result, the predictive model achieved accuracy of 72.96% after doing post and pre-prunning procedures. This work also contributes to generating a new dataset for predicting the result of regional election in Indonesia which contains related features that affect the winning of candidates.