UNP Journal of Statistics and Data Science最新文献_第4页

Fuzzy Geographically Weighted Clustering Method for Grouping Provinces in Indonesia Based on Welfare Indicators Aspects of Information and Communication Technology (ICT) 基于信息和通信技术（ICT）福利指标的印度尼西亚省模糊地理加权聚类方法

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/108

Hefiani Mustika Hasanah, Dina Fitria, Dony Permana, Zamahsary Martha

{"title":"Fuzzy Geographically Weighted Clustering Method for Grouping Provinces in Indonesia Based on Welfare Indicators Aspects of Information and Communication Technology (ICT)","authors":"Hefiani Mustika Hasanah, Dina Fitria, Dony Permana, Zamahsary Martha","doi":"10.24036/ujsds/vol1-iss5/108","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/108","url":null,"abstract":"The welfare of the people is a task and goal that must be realized by the Republic of Indonesia. To find out the condition of the welfare of the Indonesian people, it can be seen from 8 areas of Indonesia's welfare indicators. Indicators The welfare of the Indonesian people is undergoing a digital transformation of Information and Communication Technology (ICT) in 2021. However, there was a gap in ICT development due to geographical conditions and the distribution and dynamics of each region's society. Cluster analysis is a solution for target setting for better future decisions. Fuzzy Geographically Weighted Clustering (FGWC) is one of the cluster methods with fuzzy logic that considers geographical and population elements in grouping targets. The results of the research resulted in 3 optimum clusters with different characteristics of each cluster based on indicators of ICT aspects of people's welfare. Cluster 1 has a low status of ICT indicators of people's welfare and is located in the middle or at the end of the island, provinces from cluster 2 have a medium status of ICT indicators of people's welfare with a medium area, while cluster 3 has a high status of ICT indicators of people's welfare with a large area or dense populations.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139199757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fuzzy Geographically Weighted Clustering Analysis for Sectoral Potential Gross Regional Domestic Product in West Sumatera 西苏门答腊各部门潜在地区国内生产总值的模糊地理加权聚类分析

UNP Journal of Statistics and Data Science Pub Date : 2023-11-30 DOI: 10.24036/ujsds/vol1-iss5/109

Syifa Nabilah Wandira, Zilrahmi, Fadhilah Syafriandi, Fitri, Kehutanan dan Perikanan Pertanian, Bukittinggi Kabupaten, Pasaman Barat, Kabupaten Kepulauan, Mentawai Kota, Padang, Pengadaan Listrik, dan Gas, Sawahlunto, Pengadaan Air, No Sektor, Terendah Tertinggi, Informasi Komunikasi, Jasa Keuangan, Jasa Kesehatan, Kegiatan Sosial, Jasa Lainnya

{"title":"Fuzzy Geographically Weighted Clustering Analysis for Sectoral Potential Gross Regional Domestic Product in West Sumatera","authors":"Syifa Nabilah Wandira, Zilrahmi, Fadhilah Syafriandi, Fitri, Kehutanan dan Perikanan Pertanian, Bukittinggi Kabupaten, Pasaman Barat, Kabupaten Kepulauan, Mentawai Kota, Padang, Pengadaan Listrik, dan Gas, Sawahlunto, Pengadaan Air, No Sektor, Terendah Tertinggi, Informasi Komunikasi, Jasa Keuangan, Jasa Kesehatan, Kegiatan Sosial, Jasa Lainnya","doi":"10.24036/ujsds/vol1-iss5/109","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss5/109","url":null,"abstract":"Gross Regional Domestic Product is the sum of the added value of all goods and services produced or produced in an area that arises as a result of various economic activities in a certain period. Each region certainly has its own advantages and potential, such as in sectors or business fields. GRDP inequality occurs due to differences in geographical conditions and natural resources in each region. The method that can be used to overcome this inequality is cluster analysis. Fuzzy Geographically Weighted Clustering is a clustering method which is an integration of the classical fuzzy clustering method and geo-demographic elements, so that the clusters formed will be sensitive to geographic effects. The results of the research obtained 3 optimum clusters with different characteristics. Cluster 1 has high potential, cluster 2 has low potential and cluster 3 has medium potential in forming GRDP.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"23 7","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139196530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pemodelan Waktu Survival Pasien Tuberkulosis menggunakan Regresi Cox Proportional Hazard dengan Data Tersensor 结核病患者的生存时间建模使用考克斯与受审查数据的比例回归

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/65

None Elsa Oktaviani, None Nonong Amalita, None Atus Amadi Putra, None Dony Permana

引用次数: 0

Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data 不平衡数据分类建模错误率预测方法与CHAID方法的比较

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/81

None Seif Adil El-Muslih, None Dodi Vionanda, None Nonong Amalita, None Admi Salma

{"title":"Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data","authors":"None Seif Adil El-Muslih, None Dodi Vionanda, None Nonong Amalita, None Admi Salma","doi":"10.24036/ujsds/vol1-iss4/81","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/81","url":null,"abstract":"CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sentiment Analysis of Electric Cars Using Naive Bayes Classifier Method 基于朴素贝叶斯分类器的电动汽车情感分析

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/68

None NURUL AFIFAH, None Dony Permana, None Dodi Vionanda, None Dina Fitria

{"title":"Sentiment Analysis of Electric Cars Using Naive Bayes Classifier Method","authors":"None NURUL AFIFAH, None Dony Permana, None Dodi Vionanda, None Dina Fitria","doi":"10.24036/ujsds/vol1-iss4/68","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/68","url":null,"abstract":"In recent years, electric cars have become increasingly popular as an alternative to environmentally friendly vehicles in the automotive industry. These vehicles use electric power as an energy source that can reduce dependence on fossil fuels so as to contribute to efforts to reduce greenhouse gas emissions and air pollution. However, the presence of electric cars raises pro and con opinions from the public. Where, the conversation about electric cars has become one of the hot conversations on social media twitter. Twitter is a microblogging-based social media that facilitates its users to write short messages and share them easily and quickly. These opinions require sentiment analysis. The purpose of conducting sentiment analysis is to find out how people's perceptions and opinions on electric cars are leading in a positive direction or in a negative direction. Thus, sentiment analysis can help companies in designing marketing strategies, product development, and making better business decisions. Then the opinions will be classified based on positive and negative categories. This research uses the naive bayes classifier method to generate positive and negative sentiment towards electric cars on Twitter. The accuracy results of naive bayes obtained by using a confusion matrix in this research are 78.57% with a dataset split composition of 80%:20%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of the Fuzzy Time Series-markov Chain Method to the Rupiah Exchange Rate Against the US Dollar (USD) 模糊时间序列-马尔可夫链方法在印尼盾对美元汇率计算中的应用

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/91

None Rahmad revi fadillah, None Dony Permana, None Yenni Kurniawati, None Admi Salma

引用次数: 0

Analysis of the Poverty Level Model for West Sumatra Province Using Geographically Weighted Binary Logistic Regression 西苏门答腊省贫困水平模型的地理加权二元Logistic回归分析

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/80

None April leniati, None Dony Permana, None Nonong Amalita, None Zamahsary Martha

{"title":"Analysis of the Poverty Level Model for West Sumatra Province Using Geographically Weighted Binary Logistic Regression","authors":"None April leniati, None Dony Permana, None Nonong Amalita, None Zamahsary Martha","doi":"10.24036/ujsds/vol1-iss4/80","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/80","url":null,"abstract":"Poverty is a widespread social problem that affects many developing countries, including Indonesia. The province of West Sumatra has a relatively low poverty rate of around 5.92 percent, making it the third lowest on the island of Sumatra. However, there are several districts and cities in this province that still have many people living in poverty. Various factors such as income levels, social conditions, and access to education, can contribute to the poverty gap in various regions. Geographically Weighted Binary Logistic Regression (GWBLR) is used to examine the relationship between poverty and geographic factors. GWBLR is a statistical analysis technique that takes geographic variables into account when the response variable is categorical or dichotomous. This approach incorporates a bandwidth-dependent weighting function. By conducting a fit test using R software, it is known that the Fcount value is greater than the Ftable value, indicating a significant difference between the logistic regression model and GWBLR. The results show that the GWBLR model with Fixed Gaussian Kernel weights is the most effective in analyzing poverty in the province. This model shows the lowest Akaike Information Criterion (AIC) value. Furthermore, this study identifies the Life Expectancy Variable as a significant factor affecting poverty in certain districts and cities in West Sumatra Province in 2022. ","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data 平衡数据二元Logistic回归模型错误率预测方法比较

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/90

None Shavira Asysyifa S, None Dodi Vionanda, None Nonong Amalita, None Dina Fitria

{"title":"Comparison of Error Rate Prediction Methods in Binary Logistic Regression Model for Balanced Data","authors":"None Shavira Asysyifa S, None Dodi Vionanda, None Nonong Amalita, None Dina Fitria","doi":"10.24036/ujsds/vol1-iss4/90","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/90","url":null,"abstract":"Binary Logistic Regression is one of the statistical methods that can be used to see the relations between dependent variable with some independent variables, where the dependent variable split into two categories, namely the category declaring a successful event and the category declaring a failed event. The performance of binary logistic regression can be seen from the accurary of the model. Accuracy can be measured by predicting the error rate. One method that can be used to predict error rate is cross validation. The cross validation method works by dividing the data into two parts, namely testing data and training data. Cross validation has several learning methods that are commonly used, namely Leave One Out (LOO), Hold out, and K-fold cross validation. LOO has unbiased estimation of accuracy but take a long time, hold out can avoid overfitting and works faster because no iterations, and k-fold cross validation has smaller error rate prediction. Meanwhile, data cases with different correlation are useful to find out the different correlations effect performance of error rate prediction method. In this study uses artificially generated data with a normal distribution, including univariate, bivariate, and multivariate datasets with various combination of mean differences and correlation. Considering these factors, this study focuses on comparing the three cross validation methods for predicting error rate prediction in binary logistic regression. This study finds out that k-fold cross validation method is the most suitable method to predict errors in logistic regression modeling for balanced data.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing Classification and Regression Tree and Logistic Regression Algorithms Using 5×2cv Combined F-Test on Diabetes Mellitus Dataset 用5×2cv联合f检验比较分类回归树和逻辑回归算法对糖尿病数据集的影响

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/84

None Fashihullisan, None Dodi Vionanda, None Yenni Kurniawati, None Fadhilah Fitri

{"title":"Comparing Classification and Regression Tree and Logistic Regression Algorithms Using 5×2cv Combined F-Test on Diabetes Mellitus Dataset","authors":"None Fashihullisan, None Dodi Vionanda, None Yenni Kurniawati, None Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss4/84","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/84","url":null,"abstract":"Classification is the process of finding a model that describes and distinguishes data classes that aim to be used to predict the class of objects whose class labels are unknown. There are several algorithms in classification, such as classification trees and regression trees (CART) and logistic regression. The k-fold cross validation method has a weakness for algorithm comparison problems it is possible at different folds to produce different error predictions, so that the results of comparing algorithm performance will also be different. There for in the problem of comparison of algorithms, the researcher will apply the 52cv t test method and the 52cv combined F test. Out of 100 iterations the 10-fold cross validation method was only consistent three times which shows that the k-fold cross validation method has poor consistency in comparing the CART algorithm and logistic regression for diabetes mellitus data. In addition, 52cv combined F test and 52cv t test methods that have been carried out show that 52cv combined F test is better used to get conclusions from the results of a comparison of the two algorithms because it only produces one decision, in contrast to 52cv t test which has the possibility to get different decisions from 10 test statistics which results makes it difficult for researchers to draw conclusions in comparing the cart algorithm and logistic regression","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Perbandingan Metode Prediksi Laju Galat dalam Pemodelan Klasifikasi Algoritma C4.5 untuk Data Tidak Seimbang 在分类算法C4.5中对错误速率预测方法比较不平衡的数据

UNP Journal of Statistics and Data Science Pub Date : 2023-08-28 DOI: 10.24036/ujsds/vol1-iss4/89

None Yunistika Ilanda, None Dodi Vionanda, None Yenni Kurniawati, None Dina Fitria

{"title":"Perbandingan Metode Prediksi Laju Galat dalam Pemodelan Klasifikasi Algoritma C4.5 untuk Data Tidak Seimbang","authors":"None Yunistika Ilanda, None Dodi Vionanda, None Yenni Kurniawati, None Dina Fitria","doi":"10.24036/ujsds/vol1-iss4/89","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss4/89","url":null,"abstract":"Pemodelan klasifikasi dapat dibentuk menggunakan algoritma C4.5. Model yang dibentuk oleh algoritma C4.5 perlu dilihat akurasi prediksinya menggunakan metode prediksi laju galat. Metode prediksi laju galat yang membedakan data training dan data testing memiliki kinerja lebih baik. Tiga metode prediksi laju galat dengan pembagian data training dan testing yang sering digunakan yaitu Hold Out (HO), Leave One Out Cross Validation (LOOCV), dan K-Fold Cross Validation (K-Fold CV). Penelitian ini berfokus pada perbandingan metode prediksi laju galat HO,LOOCV, dan K-Fold CV pada algoritma C4.5 untuk kasus data tidak seimbang, sebab kasus ini sering dijumpai dalam kasus nyata klasifikasi. Data tidak seimbang menyebabkan peningkatan kesalahan klasifikasi algoritma C4.5 sebab hasil prediksi tidak merepresentasikan seluruh data dan memperburuk kinerja metode prediksi laju galat. Sementara itu, kasus data dengan korelasi berbeda dilakukan untuk mengetahui apakah beda korelasi mempengaruhi kinerja metode prediksi laju galat. Tujuan penelitian untuk mengetahui metode prediksi laju galat yang paling cocok diterapkan pada algoritma C4.5 kasus data tidak seimbang dan pengaruh beda korelasi. Hasil penelitian menunjukkan metode K-Fold CV adalah metode prediksi yang paling cocok diterapkan pada algoritma C4.5 kasus data tidakseimbang dibanding metode HO dan LOOCV. Selain itu, Korelasi tinggi dapat memperburuk kinerja metode prediksi laju galat.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135134922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0