UNP Journal of Statistics and Data Science最新文献_第3页

Comparison of Error Rate Prediction in CART for Imbalanced Data CART 对不平衡数据的误差率预测比较

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/117

Lifia Zullani, Dodi Vionanda, Syafriandi, Dina Fitria

{"title":"Comparison of Error Rate Prediction in CART for Imbalanced Data","authors":"Lifia Zullani, Dodi Vionanda, Syafriandi, Dina Fitria","doi":"10.24036/ujsds/vol1-iss5/117","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/117","url":null,"abstract":"CART is one of the tree based classification algorithms. CART is a tree consisting of root nodes, internal nodes, and terminal nodes. The accuracy of the model in CART can be calculated by measuring prediction errors in the model. One common method used to predict error rates is cross-validation. There are three cross-validation algorithms, namely leave one out, hold out, and k-fold cross-validation. These methods have different performance in dividing data into training data and testing data, so there are advantages and disadvantages to each method. Every algorithm has its shortcomings; hold out cannot guarantee that the training set represents the entire dataset, leave one out is very time-consuming and requires significant computation because it has to train the model as many times as there are data points, and k-fold provides longer computation time because the training algorithm must be run k times. In reality, the data often encountered is imbalanced. Imbalanced data refers to data with a different number of observations in each class. In CART, imbalanced data affects the prediction results. This research focuses on comparing error rate prediction methods in the CART model with imbalanced data. The study uses three types of data: univariate, bivariate, and multivariate, obtained from differences in population means and correlations between independent variables. The results obtained indicate that the k-fold algorithm is the most suitable error rate prediction algorithm applied to CART with imbalanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139207247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Penerapan Metode Regresi Kuantil pada Data yang Mengandung Outlier untuk Tingkat Kejahatan di Jabodetabek 定量回归法在贾博德塔贝克犯罪率异常值数据中的应用

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/94

Arssita Nur Muharromah, Zamahsary Martha, Dony Permana, Tessy Octavia Mukhti

{"title":"Penerapan Metode Regresi Kuantil pada Data yang Mengandung Outlier untuk Tingkat Kejahatan di Jabodetabek","authors":"Arssita Nur Muharromah, Zamahsary Martha, Dony Permana, Tessy Octavia Mukhti","doi":"10.24036/ujsds/vol1-iss5/94","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/94","url":null,"abstract":"Masalah kejahatan semakin meluas di Indonesia. Tingkat kejahatan di Jabodetabek merupakan yang tertinggi kedua di Indonesia. Dalam penelitian yang mengandung outlier ini, metode yang tepat untuk penelitian ini adalah regresi kuantil. Regresi Kuantil merupakan pengembangan dari regresi median atau metode Least Absolute Deviation (LAD) yang berguna untuk membagi data menjadi dua bagian untuk meminimalisir kesalahan. Namun, LAD ini dianggap tidak baik untuk pemodelan, oleh karena itu muncullah regresi kuantil. Regresi kuantil berguna untuk mengatasi masalah asumsi yang tidak terpenuhi dalam regresi klasik yaitu gejala heteroskedastisitas dan regresi kuantil dapat memodelkan data yang mengandung outlier. Pendekatan metode regresi kuantil adalah memisahkan ataupun membagi data menjadi beberapa bagian atau kuantil tertentu yang diduga terdapat perbedaan nilai estimasi. Pengukuran kebaikan model yang dihasilkan menggunakan koefisien determinasi atau R2 pada setiap kuantil. Pada penelitian ini digunakan lima kuantil yaitu 0,05; 0,25; 0,50; 0,75; dan 0,95. Dari hasil analisis diketahui bahwa model estimasi parameter terbaik terdapat pada kuantil 0,95 dengan seluruh variabel independen berpengaruh signifikan terhadap variabel dependen (tingkat kejahatan). sedangkan pada kuantil 0,25 dan 0,50 tidak ada variabel bebas yang berpengaruh signifikan, hal ini mungkin disebabkan pengaruh faktor lain yang tidak terdapat dalam penelitian yang mempengaruhi masing-masing kuantil.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"196 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139207824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification of Coronary Heart Disease at Semen Padang Hospital using Algorithm Classification And Regression Trees (CART) 使用分类和回归树（CART）算法对塞门巴东医院的冠心病进行分类

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/104

Defal Aditya, Atus Amadi Defran, Putra, Dodi Vionanda, dan Tessy, Octavia Mukhti

引用次数: 0

Forecasting the Exchange Rate of Yen to Rupiah Using the Long Short-Term Memory Method 利用长短期记忆法预测日元对印尼盾的汇率

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/114

Anggi Adrian, Yenni Danis, Kurniawati, N. Amalita, F. Fitri

引用次数: 0

Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data 针对平衡数据的 Claassification 建模中的误差预测方法与 CHAID 方法的比较

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/116

Findri Wara Putri, Dodi Vionanda, Atus Amadi putra, Fadhilah Fitri

{"title":"Comparison of Error Prediction Methods in Claassification Modeling with CHAID Methods for Balanced Data","authors":"Findri Wara Putri, Dodi Vionanda, Atus Amadi putra, Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss5/116","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/116","url":null,"abstract":"Chi-Squared Automatic Interaction Detection (CHAID) is an exploratory method for classifying data by building classification trees. The classification result are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The goal is to see the performance of the model. The accuracy of this model can be determined by calculating the level of prediction error in the model. The error rate prediction method works by dividing data into training data and testing data. There are three methods in the error rate prediction method, such as Leave one out cross validation (LOOCV), Hold out, and k-fold cross validation. These methods have different performance in dividing data into training data and test data, so that each method has advantages and disadvantages. Therefore, a comparison of the three error rate prediction methods was carried out with the aim of determining the appropriate method for the CHAID. This research is included in experimental research and uses simulation data from data generation results in RStudio. This comparison is carried out by considering several factors, namely the marginal probability matrix and different correlations. The comparison results will be observed using a boxplot by looking at the median error rate and lowest variance. This research found that k-fold cross validation is the most suitable error rate prediction method applied to the CHAID method for balanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"38 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139198918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implementation Self Organizing Maps Method In Cluster Analysis Based on Achievement Suistainable Development Goal/SDG’s West Sumatera Province 在基于西苏门答腊省可实现发展目标/SDG 的聚类分析中实施自组织地图方法

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/118

AL Rezki Ivansyah, Fadhilah Fitri, Yenni Kurniawati, dan Tessy, Octavia Mukhti

{"title":"Implementation Self Organizing Maps Method In Cluster Analysis Based on Achievement Suistainable Development Goal/SDG’s West Sumatera Province","authors":"AL Rezki Ivansyah, Fadhilah Fitri, Yenni Kurniawati, dan Tessy, Octavia Mukhti","doi":"10.24036/ujsds/vol1-iss5/118","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/118","url":null,"abstract":"Indonesian government's commitment to implementing the Sustainable Development Goals (SDG’s) agenda, particularly in West Sumatra. The government of West Sumatra supports the objectives and targets of achieving SDG’s by optimizing the implementation of SDG indicators in the Rencana Aksi Daerah (RAD) for SDG’s of West Sumatra Province for the years 2022-2026. However, in its execution, there is a need for annual monitoring and evaluation of the RAD for SDG’s in West Sumatra Province. Clustering is employed to serve as a consideration for evaluating the implementation of RAD for SDG’s in West Sumatra Province for the years 2022-2026. The clustering method used is Self Organizing Map (SOM), an effective tool for visualizing high-dimensional data and can be used to map high-dimensional data into one, two, or three dimensions, representing connected units or neurons. The data used consist of 14 SDG indicator variables across 19 regencies/cities in West Sumatra in the year 2022, sourced from the official website and publications of the Badan Pusat Statistika (BPS) of West Sumatra Province. The analysis results in the formation of 3 clusters with different characteristics, which can be used as references in making policy decisions and effective strategies to enhance the implementation performance of SDG’s programs in West Sumatra Province.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139200211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sentiment Analysis of Prabowo Subianto as 2024 Presidential Candidate on Twitter Using K-Nearest Neighbor Algorithm 使用 K 近邻算法对推特上普拉博沃-苏比安托作为 2024 年总统候选人的情感分析

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/101

Aurumnisva Faturrahmi, Zamahsary Martha, Yenni Kurniawati, Fadhilah Fitri

引用次数: 0

Structural Equation Modeling Partial Least Square (SEM-PLS) Untuk Membandingkan Kondisi Public Speaking Anxiety Mahasiswa Soshum dan Saintek 用结构方程模型偏最小二乘法（SEM-PLS）比较社科类学生和理科类学生的公开演讲焦虑状况

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/132

Sabina Chairun Najwa, Natasya Dwi Ovalingga, Hanifah Nazhiroh, R. Akbar, Fadhilah Fitri

{"title":"Structural Equation Modeling Partial Least Square (SEM-PLS) Untuk Membandingkan Kondisi Public Speaking Anxiety Mahasiswa Soshum dan Saintek","authors":"Sabina Chairun Najwa, Natasya Dwi Ovalingga, Hanifah Nazhiroh, R. Akbar, Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss5/132","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/132","url":null,"abstract":"Public speaking is a communication skill to deliver opinion or massage to the audience. Public speaking anxiety, caused by various factors. Social and science students have differences in culture and learning systems. Therefore, students in both educational clusters have their own ways of overcoming communication barriers. This study aimed to identify factors that influence public speaking anxiety in social and science students at Padang State University. The method used is the Structural Equation Model Partial Least Square (SEM-PLS) to understand the influential factors in more detail and minimize analysis errors caused by missing values and multicollinearity due to diverse samples. The results of the analysis are path diagrams for structural models and outer loading tables. If the < value is 0.7, then recalculation is carried out so that a new model is formed. The feasibility of the social science family model was obtained 35% and the scientific science family was 36.5%. The effect of latent or exogenous variables in this study is weak. Social students have higher levels of speech anxiety than science students. This is influenced by humiliation, unfamiliar role, and negative result factors. In science students, the influencing factors are humiliation, preparation, and unfamiliar Role.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139206640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Naive Bayes Classifier Method on Sentiment Analysis of Bibit Application Users in Play Store 对 Play Store 中 Bibit 应用程序用户进行情感分析的 Naive Bayes 分类器方法

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/102

Afifa Lufti Insani, Zamahsary Martha, Yenni Kurniawati, Zilrahmi

{"title":"Naive Bayes Classifier Method on Sentiment Analysis of Bibit Application Users in Play Store","authors":"Afifa Lufti Insani, Zamahsary Martha, Yenni Kurniawati, Zilrahmi","doi":"10.24036/ujsds/vol1-iss5/102","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/102","url":null,"abstract":"The increasing public interest in investment and supported by technological advances has begun to appear investment applications in the community which aim to facilitate the public in making investments. One of the investment applications that is widely used today is the Bibit application. This application is widely used by novice investors because of its ease of opening accounts, disbursing funds, purchasing mutual funds and easy-to-understand application design. Because investment applications are still new to the community, there are still many people who doubt and worry about the quality of the Bibit application, marked by the number of reviews in the review column available on the play store. Reviews on the application become a forum for criticism and suggestions to the application and become one of the considerations for potential users. Because reviews can be positive or negative towards the Seedling application. Sentiment analysis is needed to analyze whether the sentiment tends to be positive or negative. Then, classification is carried out to obtain a classification model that can be used to predict user sentiment using the Naive Bayes Classifier method. The results obtained obtained seed application users tend to have positive sentiments with an accuracy value of 79.45%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":" 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139197357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification of Nutrition Problems for Indonesian Toddler With Decision Tree Algorithm C4.5 用决策树算法 C4.5 对印度尼西亚幼儿的营养问题进行分类

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/98

Nadhea Ovella Syaqhasdy, Zamahsary Martha, N. Amalita, D. Fitria

{"title":"Classification of Nutrition Problems for Indonesian Toddler With Decision Tree Algorithm C4.5","authors":"Nadhea Ovella Syaqhasdy, Zamahsary Martha, N. Amalita, D. Fitria","doi":"10.24036/ujsds/vol1-iss5/98","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/98","url":null,"abstract":"Indonesia continues to encounter numerous challenges, particularly in the health and economic sectors. As the future of the nation, the quality of human resources is crucial for Indonesia's development. The development of Indonesia is key to improving the quality of life of its people, and a focus on this development can positively impact the health and economy of the community. A healthy and educated generation is fundamental for the country's expected progress, as nutritional status is one of the factors significantly affecting the quality of human resources. Nutritional problems can cause serious impacts, such as improper physical growth, decreased IQ quality, and even death. The goal is to analyze the factors affecting the nutritional status of toddlers by classifying each variable using a decision tree. A decision tree is a flow chart that resembles a branching tree structure. The C4.5 algorithm was utilized in this study. It can process both numeric and categorical data, handle missing attribute values, and generate easy-to-interpret rules. After conducting the analysis, it was found that there are 392 districts/cities in Indonesia where the prevalence of stunted toddler nutritional status is less than 20%. The model created using the C4.5 algorithm was evaluated and achieved an accuracy of 99.8% and a kappa value close to 1. This indicates that the model can accurately classify toddler nutrition problems in Indonesia.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"58 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139198589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0