Rama Novialdi, D. Permana, Dodi Vionanda, F. Fitri
{"title":"Diagnosis of the type of delivery of pregnant women at Semen Padang Hospital Using the C4.5 Method","authors":"Rama Novialdi, D. Permana, Dodi Vionanda, F. Fitri","doi":"10.24036/ujsds/vol2-iss1/130","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/130","url":null,"abstract":"ABSTRACT \u0000The health of the mother and fetus is very important, but there are many challenges and risks associated with pregnancy and childbirth. According to WHO, in 2020 there were 287,000 cases of women dying during pregnancy and childbirth. Causative factors that affect the type of delivery include the age of pregnant women, MGG, systole, diastole, and pulse. One method that can be used to group the types of childbirth of pregnant women is classification. C4.5 is one of the methods used in forming decision trees to produce decisions. The purpose of C4.5 is to obtain attributes that will be the main criteria in the classification. Based on optimal tree results, the attribute that is the main criterion in classifying the type of delivery of pregnant women who give birth by cesarean section and normal delivery at Semen Padang Hospital is MGG. Determination of classification results using confusion matrix resulted in an accuracy value of 74%, sensitivity of 80% to classify the type of delivery of pregnant women who gave birth caesarean, and specificity of 66.67% to classify the type of delivery of pregnant women who gave birth normally.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"11 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification of Stroke Disease at Dr. Drs. M. Hatta Brain Hospital Bukittinggi With Decision Tree Algorithm C4.5","authors":"Futiah Salsabila, Zamahsary Martha, Atus Amadi putra, Admi Salma","doi":"10.24036/ujsds/vol2-iss1/135","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/135","url":null,"abstract":"Stroke is a health condition that has vascular disorders where brain function is related to problems with blood vessels that carry blood to the brain. Several factors that can influence stroke include unhealthy eating habits, lack of physical activity, smoking behavior, alcohol consumption, and obesity. The symptoms experienced are headache, nausea, vomiting, blurred vision and difficulty swallowing. The researcher’s aim is to determine the risk faktors that affect the incidence of stroke hospitalization based on stroke diagnoses at the DR. Drs. M. Hatta Brain Hospital Bukittinggi city by classifying each variable using a decision tree. A decision tree is a flowchart that resembles a branching tree. The C4.5 algorithm is used in this research, which can process numerical and categorical data, can handle missing attribute values, and produces rules that are easy to interpret. The results of the analysis show that the attribute that is a risk factor for stroke is the heart. The model created using the C4.5 algorithm was tested using a counfusion matrix resulting in an accuracy of 64.54%, a precision of 53.34% for classifying ischemic stroke patients correctly, and a recall of 72.73% for classifying hemorrhagic patients correctly.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"20 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manja Danova, Dina Putri, Fitria, Yenni Kurniawati, Zilrahmi
{"title":"Classification the Characteristics of Traffic Accident Victims in Pariaman Using the Chi-square Automatic Interaction Detection Algorithm","authors":"Manja Danova, Dina Putri, Fitria, Yenni Kurniawati, Zilrahmi","doi":"10.24036/ujsds/vol2-iss1/127","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/127","url":null,"abstract":"Traffic accidents are incidents that occur when motor vehicles collide on the road, resulting in damage to vehicles and road infrastructure, as well as the potential for material losses, injuries, physical damage, and even death for those involved. Data from the Indonesian National Police show that the number of traffic accident victims between 2010 and 2020 ranged from 147.798 to 197.560 people, with fatalities predominantly occurring among individuals aged 15-34. The high number of traffic accident victims has negative impacts on various aspects of life, ranging from material losses to physical damage to the victims. Classification is a technique used to group objects or data into pre-defined classes or categories based on their attributes or features. One method in the field of classification is Chi-Square Automatic Interaction Detection (CHAID). The results of the classification using this method indicate that the age of the victims and the type of accident are the most significant variables influencing the condition of traffic accident victims. The evaluation of the model using a confusion matrix yielded an accuracy rate of 92%. This indicates that the model performs well in overall data classification.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"11 12","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140433326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annisa Rizki, N. Amalita, Yenni Kurniawati, Zamahsary Martha
{"title":"Fuzzy K-Nearest Neighbor to Predict Rainfall in Padang Pariaman District","authors":"Annisa Rizki, N. Amalita, Yenni Kurniawati, Zamahsary Martha","doi":"10.24036/ujsds/vol2-iss1/126","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/126","url":null,"abstract":"Rainfall is the amount of water that falls to the ground surface during a certain period which is measured in millimeters. The amount of rainfall can be estimated or predicted. One method used to predict rainfall is Data Mining, namely computer learning to analyze knowledge automatically so that a perfect new model is obtained. One of the best prediction algorithms in data mining is Fuzzy K-Nearest Neighbor (FK-NN) which uses the largest membership degree value from the test data in each class to determine the class. The number of sample classes obtained from rainfall data in Padang Pariaman Regency experienced unbalanced classes. One way to handle imbalance class cases is to use the Synthetic Minority Over-sampling Technique (SMOTE) technique which produces as much minority data as majority data. The results obtained in this study used the FK-NN classification with a total of 343 test data, parameter K=12 and using the Euclidean distance. The accuracy value was quite good, namely 76,38%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"12 10","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Indah Lestari, Dina Fitria, dan Admi Syafriandi, Salma
{"title":"Comparison of the C5.0 Algorithm and the CART Algorithm in Stroke Classification","authors":"Indah Lestari, Dina Fitria, dan Admi Syafriandi, Salma","doi":"10.24036/ujsds/vol2-iss1/144","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/144","url":null,"abstract":"The C5.0 and CART algorithms are similar in terms of velocity and handling of categorical and numeric type data. However, these two algorithms are differences in terms the CART algorithm is binary and classifies categorical, numerical and continuous response variables resulting in classification and regression decision trees. Meanwhile, the C5.0 algorithm is non-binary and classifies categorical response variables resulting in a classification tree. This research aims to classify the Kaggle’s Stroke Prediction Dataset to find out the variables that most influence the risk of stroke, as well as to compare the results of the classification accuracy of the both algorithms. The results of the study showed that CART algorithm has a higher value of accuracy and precision, but its recall value is lower than C5.0. The accuracy value of each algorithm is 77.9% and 77.5%, presision is 89.5% and 83.2%, recall is 67% and 71.4%. Overrall, it can be concluded that there is no difference in classification between the two algorithm. Beside that, in the CART there were 3 variables that most influence on stroke risk, they are age, BMI, and average blood glucose levels. Meanwhile, in C5.0 only 2 variable that most influence, there are age and average blood glucose levels.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"12 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of Palm Oil Production Results PT.KSI South Solok Using Ensemble k-Nearest Neighbor","authors":"Nilda Yanti, Atus Amadi putra, Dony Permana, Zilrahmi","doi":"10.24036/ujsds/vol2-iss1/136","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/136","url":null,"abstract":"In 2022 PT. KSI experienced a decline in palm oil production due to replanting steps. In managing palm oil production PT. KSI has problems with palm oil production results not reaching targets so that it can affect the Company's Work Plan and Budget, therefore it is very necessary to predict palm oil production results so that all palm oil production and processing activities can run according to plan. The ensemble technique is a method which has predictive accuracy capabilities and is very efficient to use in the k-NN method, so there is no need to search for the optimal k value. Based on the results of the analysis that has been carried out, it can be seen that by using an ensemble the level of accuracy is 9.36%, which is considered high accuracy compared to just using a single kNN with k = 1 of 10.84%. So it can be concluded that the model has worked well with the data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"31 11","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140433343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Inna Auliya, Fadhilah Fitri, N. Amalita, dan Tessy, Octavia Mukhti
{"title":"Comparison of K-Means and Fuzzy C-Means Algorithms for Clustering Based on Happiness Index Components Across Provinces in Indonesia","authors":"Inna Auliya, Fadhilah Fitri, N. Amalita, dan Tessy, Octavia Mukhti","doi":"10.24036/ujsds/vol2-iss1/150","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/150","url":null,"abstract":"Cluster analysis is a multivariate technique aimed at grouping objects into several clusters based on the characteristics they possess. This study aims to determine the clustering results of 34 provinces in Indonesia based on the indicators of the happiness index for the year 2021 by comparing non-hierarchical cluster analysis methods, namely K-Means and Fuzzy C-Means. K-Means is a non-hierarchical cluster analysis that divides objects into cluster groups based on the distance of objects to the nearest cluster center, while Fuzzy C-Means is a cluster analysis that uses a fuzzy grouping model where data becomes a member of a cluster formed based on membership degrees ranging from 0 to 1. Based on the research results, it is known that clustering with both K-Means and Fuzzy C-Means methods forms three clusters. Based on the standard deviation values between groups and the standard deviation ratio, the best method is the Fuzzy C-Means method because it has a larger standard deviation between groups and a smaller ratio compared to the K-Means method, which is 0.6680004. Therefore, this study concludes that the Fuzzy C-Means method is more optimal than the K-Means method.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"8 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140433567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Karakteristik Kondisi Air Minum Menurut Wilayah Perkotaan dan Perdesaan di Indonesia Menggunakan Metode CHAID","authors":"Aulia Wanda, Yenni Kurniawati","doi":"10.24036/ujsds/vol2-iss1/152","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/152","url":null,"abstract":"Ketersediaan dan kualitas dari air minum perlu diperhatikan, baik dari segi jumlah serta kelayakannya yang harus memenuhi syarat. Menggunakan air bersih sebagai air minum dapat mengurangi penyakit seperti diare, kolera, disentri, tipes, cacingan, penyakit kulit, dan keracunan. Air minum layak dan bersih adalah air minum yang terlindungi, diantaranya air ledeng (keran), keran umum, hydrant umum, terminal air, penampungan air hujan (PAH), atau mata air dan sumur terlindung, sumur bor/pompa dengan jarak terdekat adalah 10 meter dari lokasi pembuangan kotoran, penampungan limbah, dan pembuangan sampah. Akses air minum untuk wilayah perkotaan memiliki perbedaan dengan di wilayah pedesaan. Untuk menentukan karakteristik air minum di wilayah perkotaan dan pedesaan digunakan analisis Chi-Square Automatic Interaction Detection (CHAID). Analisis CHAID digunakan pada variabel bertipe kategorik. Sebelum tahap analisis terdapat proses data mining untuk memperoleh pengetahuan dari gugus data dan melakukan penanganan untuk adanya missing data pada gugus data tersebut. Penanganan pada missing data pada variabel kategorik dilakukan dengan imputasi modus. Menggunakan analisis CHAID, diperoleh karakteristik air minum untuk wilayah pedesaan dengan persentase yang paling tinggi adalah disaring menggunakan kain dan tidak direbus serta sumber air berada di tempat lain. Sementara untuk yang di perkotaan yang paling tinggi adalah rumah tangga dengan karakteristik air minum diberi pemutih/khlor, tidak disaring menggunakan kain, dan tidak direbus dengan sumber air berada di halaman sendiri.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140433829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Fadhil, Zilrahmi Aditya, Yenni Kurniawati, Tessy Octavia Mukhti
{"title":"Implementation of an Artificial Neural Network Based on the Backpropagation Algorithm in Forecasting the Closing Price of the Jakarta Composite Index (IHSG)","authors":"Muhammad Fadhil, Zilrahmi Aditya, Yenni Kurniawati, Tessy Octavia Mukhti","doi":"10.24036/ujsds/vol2-iss1/137","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/137","url":null,"abstract":"Investing is highly common in Indonesia. Continuous investment activities carried out by the community will increase economic activity and employment opportunities, increase national income, and increase the level of prosperity of the community. In carrying out share buying and selling transactions, there is a means for companies to obtain funds from official financiers or investors, which is called the capital market. One of the indices issued by the IDX is the Jakarta Composite Index (IHSG). Statistics can be used to help investors, the government, or related institutions to predict the value of the IHSG. One method that can be used to predict data is an Artificial Neural Network (ANN). Backpropagation method is a multi-layer ANN method that works in a supervised learning. The idea of the Backpropagation algorithm is that the input of the neural network is evaluated against the desired output results of the research showed that the BP (4,6,1) model produced an RMSE value of 28,24024 and a MAPE value of 0.00342%. Based on the results of this research, an Artificial Neural Network model based on the Backpropagation Algorithm can be applied to predict the IHSG Closing Price value.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"1 5","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentiment Analysis about Anti-LGBT Campaign using the Naïve Bayes Classifier","authors":"Rios Dacosta, Syafriandi, D. Permana, Dina Fitria","doi":"10.24036/ujsds/vol2-iss1/146","DOIUrl":"https://doi.org/10.24036/ujsds/vol2-iss1/146","url":null,"abstract":"Social media is growing so that the news that is discussed is also very fast to be known by everyone. The news or topic that is being discussed on social media is the anti-LGBT campaign. The conversation about the anti-LGBT campaign is expressed in the form of opinions that contain positive and negative feelings. The opinion is conveyed through Twitter. Twitter is a microblogging social media that allows users to create short messages and share them easily and quickly. Opinions on Twitter are used to see whether the opinion rejects or supports the anti-LGBT campaign. The use of sentiment analysis helps to see the opinion supports or rejects the anti-LGBT campaign. The algorithm used to perform sentiment analysis is the Naïve Bayes Classifier. The purpose of this study is to determine the sentiment analysis of anti-LGBT campaign tweets on Twitter. In this study using Google Colaboratory tools. The dataset used is 3103 tweets with 80% training data and 20% test data. The sentiment analysis results obtained in this study show taht Twitter users in Indonesia give more positive opinions. The use of the Naïve Bayes Classifier algorithm produces an accuracy of 68,75%, precision of 99,6%, and recall of 92,8%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"5 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140432655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}