Suwanda Risky, Syafriandi, D. Permana, Dina Fitria
{"title":"Self Organizing Maps Method for Grouping Provinces in Indonesia Based on the Landslide Impact","authors":"Suwanda Risky, Syafriandi, D. Permana, Dina Fitria","doi":"10.24036/ujsds/vol1-iss3/15","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/15","url":null,"abstract":"Indonesia is a disaster-prone country due to its climatic, soil, hydrological, geological, and geomorphological conditions. A disaster is an event or chain of events that threatens and disrupts people's lives and livelihoods. A natural disaster is a disaster caused by an event or series of events caused by nature such as a landslide. The number of landslide disaster events in Indonesia varies from province to province, this is due to differences in the characteristics of each province in Indonesia. So that the impact caused by the landslide disaster is also different. Therefore, it is necessary to group and profile so that it can be known which province has the largest impact on landslide disasters. This study used the Self Organizing Maps method in a grouping. The number of clusters to be formed is 3 based on the optimal value of internal cluster validation (Dunn, Connectivity, and Silhouette Index). Cluster 1 consists of 31 provinces, and the average impact of landslides is small. In cluster 2 consisting of 2 provinces, there are 4 dominantly more significant impacts. Cluster 3 consisting of 1 province has 1 dominant impact greater. So it can be concluded that most provinces in Indonesia have a relatively small impact on landslide disasters. However, some provinces have a very large impact on landslides, namely the provinces of West Java, Central Java, and East Java.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130077276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jimmi Darma, Dina Putra, Fitria, Dodi Vionanda, Admi Salma
{"title":"Geographically Weighted Panel Regression for Modeling The Percentage of Poor Population in West Sumatra","authors":"Jimmi Darma, Dina Putra, Fitria, Dodi Vionanda, Admi Salma","doi":"10.24036/ujsds/vol1-iss3/64","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/64","url":null,"abstract":" Geographically Weighted Panel Regression (GWPR) model applies panel regression to spatial data, and parameter estimation is carried out using spatial weight at each observation point. The purpose of this study is to determine the GWPR model and the factors that influence the percentage of poor people in each district/city in West Sumatra Province from 2015 to 2021. And the adaptive bisquare kernel function was used to provide spatial weighting, and Cross-Validation (CV) criteria were used to identify the optimal bandwidth. The research data was secondary data sourced from the official website and West Sumatra published books in Sumatera Barat Dalam Angka from 2015 to 2021. The GWR model and the FEM panel data regression model are combined to create the GWPR model. The results of this study is there are a differences between models and factors that affecting the poor percentages in 19 districts/cityes of West Sumatra.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115896445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediksi Harga Saham PT Bank Syariah Indonesia Tbk Menggunakan Support Vector Regression","authors":"Isra Miraltamirus, Fadhilah Fitri, Dodi Vionanda, Dony Permana","doi":"10.24036/ujsds/vol1-iss3/43","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/43","url":null,"abstract":"Sebuah perusahaan memerlukan dana dari luar perusahaan agar segala aspek perkembangan yang dibutuhkan dapat terpenuhi. Perusahaan yang membutuhkan modal dapat melakuakan penawaran umum (go public) dan menjual efek pada sebuah perusahaan bursa. Pergerakan harga saham cenderung berfluktuatif, sehingga akan berdampak pada income yang akan diterima oleh perusahaan dan investor. Permasalahan inilah yang saat ini sedang terjadi pada PT BSI Tbk, sehingga perlu dilakukan pemodelan harga saham untuk memprediksi nilai harga saham PT BSI Tbk pada beberapa hari mendatang. Support vector regression merupakan salah satu metode machine learning yang dapat mengatasi data yang berfluktuasi dengan menghasilkan model prediksi yang baik. SVR bertujuan untuk menemukan hyperplane yang optimal untuk menghasilkan model prediksi yang baik. SVR menggunakan fungsi kernel untuk mengatasi data non linier dengan cara memetakan data dari input space ke feature space yang lebih tinggi lagi, sehingga akan lebih mudah dalam membentuk hyperplane yang optimal. Fungsi kernel yang digunakan dalam penelitian ini adalah radial basis function. Adapun hasil dari penelitian ini yaitu diperoleh parameter terbaik dengan C = 100, ϵ = 0,01, dan γ = 0,001 dan menghasilkan akurasi kesalahan model sebesar 0,87%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116638302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sentiment Analysis og Goride Services on Twitter Social Media Using Naive Bayes Algorithm","authors":"Puti Utari Maharani, Nonong amalita, Atus Amadi putra, Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss3/41","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/41","url":null,"abstract":"Online motorcycle taxi is an application-based transportation technology innovation. Online motorcycles offer relatively low prices and offer discount features. However, the existence of online motorcycles creates congestion problems and conflicts between conventional transports. Various speculations arose in the midst of the public against Goride. So it makes the public opine and wants to judge an object openly through social media, one of which is Twitter. An opinion given by society is a textual opinion that can be analyzed. Sentiment analysis is used to detect opinions in the form of a person's judgment, evaluation, attitude, and emotion. The textual classification algorithm used in this study was Naive Bayes. This research aims to find out the public sentiment towards Goride's service as an online motorcycle taxi in positive and negative categories and to find out the accuracy results of the Naive Bayes algorithm against Goride's service. The data used in this study are secondary data. Data obtained by crawling using an API provided by Twitter developer. Analysis techniques are performed by text preprodeing, data labelling, word weighting, classification, then performance evaluation of classification. The results of the positive category sentiment classification are 698 data, while the negative category sentiment is 517 data. Which means more positive sentiment than negative sentiment. The accuracy performance of the Naive Bayes algorithm results in an accuracy rate of 77.78%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130039599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nazifatul Azizah, F. Fitri, Dodi Vionanda, Zamahsary Martha
{"title":"Application of singular spectrum analysis method to forecast rice production in west sumatra","authors":"Nazifatul Azizah, F. Fitri, Dodi Vionanda, Zamahsary Martha","doi":"10.24036/ujsds/vol1-iss3/58","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/58","url":null,"abstract":"The imbalance between the population and rice production will cause various negative impacts such as food crisesand increasing poverty, so forecasting needs to be done to maintain food availability in the future. This study aimsdetermine the results of rice production in West Sumatra Province for 12 periods in 2023 using the SA method. Basedon the results of the analysis, rice production in 2023 for 12 periods tends to decrease compared to the previous year.Forecasting rice production using the SSA method with L=21 can be said to be accurate with a MAPE obtained of17.69%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115697178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Muhammad Alif Yustin, Zilrahmi, Atus Amadi putra, Fadhilah Fitri
{"title":"Comparison Fuzzy Time Series Cheng and Ruey Chyn Tsaur Model for Forecasting Sales at Empat Saudara Store","authors":"Muhammad Alif Yustin, Zilrahmi, Atus Amadi putra, Fadhilah Fitri","doi":"10.24036/ujsds/vol1-iss3/56","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/56","url":null,"abstract":"Trading business is a type of business that focuses on buying goods and reselling them with the aim of making a profit without making changes to the condition of the goods being sold. The problem that often occurs at the Empat Saudara Store is excess or deficiency in the stock of goods owned, where consumer demand is high but goods are insufficient and consumer demand is low but goods are available. One effort to overcome these problems is to make stable sales happen by forecasting to find out future sales. Forecasting is an activity that aims to estimate or predict what will happen in the future by using historical data from the past. The research method used is Fuzzy Time Series (FTS) because this method's forecasting system is to capture patterns from past data and then use it to project future data based on linguistic values. FTS models used are FTS Cheng and FTS Ruey Chyn Tsaur. The five-period forecasting results for FTS Cheng are 200,668.2 , 171,761.5 , 222,412.6 , 214,507.4 , 216,294.3 and for the FTS Ruey Chyn Tsaur model are 198,600 , 229,094.2 , 202,203.05, 230,804.80 ,6. With a MAPE value of the FTS Cheng model of 9.904% and a MAPE value of the FTS Ruey Chyn Tsaur model of 14.01%. From the forecasting results it can be concluded that the FTS Cheng model is better than the FTS Ruey Chyn Tsaur model in predicting sales at the Empat Saudara Store.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127278570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification for Covid-19 Affected Family Cash Aid Recipients Using Naïve Bayes Algorithm","authors":"Mutiara Amazona Sosiawati, Syafriandi Syafriandi, Dony Permana, Zilrahmi","doi":"10.24036/ujsds/vol1-iss3/53","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss3/53","url":null,"abstract":"The COVID-19 pandemic that occurred in Indonesia had a huge impact on the country's economy. One of the solutions set by the government in dealing with COVID-19 is to use APBD funds for social assistance in the form of cash, namely \"Village Direct Cash Assistance\" (BLT DD). With the hope that the people affected by COVID-19 can be helped by this assistance. There are several problems in the distribution of social assistance, one of which is recipients who are not on target. Therefore, it is necessary to use methods to correctly classify recipients. This study uses the Naïve Bayes method to classify people who receive and do not receive aid. From the results obtained on the confussion matrix, the people who received BLT DD assistance and were predicted to receive were as many as 33 people/KK, the people who did not receive BLT DD and were predicted not to receive as many as 34 people/KK, the people who received BLT DD and were predicted not to receive as many as 2 people/KK , and people who do not receive BLT DD and are predicted to receive as many as 6 people/families. As for the classification accuracy value obtained using the Naïve Bayes method is 89%, while the error rate obtained is 11%.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121738985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of the Performance of the K-Means and K-Medoids Algorithms in Grouping Regencies/Cities in Sumatera Based on Poverty Indicators","authors":"Mardhiatul Azmi, Atus Amadi putra, Dodi Vionanda, Admi Salma","doi":"10.24036/ujsds/vol1-iss2/25","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss2/25","url":null,"abstract":"K-Means is a non-hierarchical approach that separates data into a number of groups according on how far an object is from the closest centroid. K-Medoids is a non-hierarchical clustering technique that separates data into a number of groups according on how far away an object is from the closest medoid. The two approaches were put to the test using data on poverty in Sumatra in 2021, when the Covid-19 outbreak had caused the poverty rate to increase from the year before. This research is an applied research which begins by studying relevant theories. The data used in this study is secondary data sources from the BPS website regarding poverty indicators. This study aims to determine regional groups and compare the results of grouping with the k-means and k-medoids methods. To find out the best performance between the two methods, that is by looking at the lowest Davies Bouldin Index (DBI). The results of this study are the k-means algorithm produces as many as 34 districts/cities incorporated in cluster 1, 52 districts/cities in cluster 2, 23 districts/cities in cluster 3, and 45 districts/cities in cluster 4. k-medoids, namely in clusters 1, 2, 3, and 4, respectively, as many as 53, 40, 37, and 24 districts/cities. Based on the results of the grouping, the DBI k-means of 1,584 and k-medoids of 2,359 were obtained. This means that the k-means algorithm is better than the k-medoids, because the k-means DBI is smaller than the k-medoids.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129732090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Application of Random Forest for The Classification Diabetes Mellitus Disease in RSUP Dr. M. Jamil Padang","authors":"Fazhira Anisha, Dodi Vionanda, Nonong amalita, Zilrahmi","doi":"10.24036/ujsds/vol1-iss2/30","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss2/30","url":null,"abstract":"Diabetes Mellitus is a disease in which blood sugar levels go beyond normal (GDS>200 mg/dl). Diabetes Mellitus may be defined as an insulin function disorder in the pancreatic organ. Diabetes Mellitus is a world health problem as incidents of this disease are increasing in every part of the world, including Indonesia. Prevention and control of the disease need to be made so as not to cause complications in other organs even to death. Because of this, one needs to study a method to predict the occurance of this disease and to knows the variable that most affect a person suffered from it. This could be accomplished by using a classification methods. One of classification methods is Random Forest. In this case study using randomForest packages in RStudio software. In general, the result of this study are the smallest OOB’s error rates (%) and Variable Importance Measure (VIM) using Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDG) values.The classification by a Random Forest methods on the incidence of Diabetes Mellitus in RSUP Dr. M. Jamil Padang results in OOB’s error rate was 1,2% or accuracy rates was 98,8%. The most optimal model produced using mtry = 4 and ntree = 1000. If used MDA, the variables that most affect are Age, Polyphagia, Polyuria, HB, and BMI. While if used MDG, the variables that most affect are Age, Polyphagia, BMI, HB, and Delayed Healing.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116661681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparison of Naïve Bayes and K-Nearest Neighbor for DKI Jakarta Air Pollution Standard Index Classification","authors":"Nurdalia, Zilrahmi, D. Permana, Admi Salma","doi":"10.24036/ujsds/vol1-iss2/29","DOIUrl":"https://doi.org/10.24036/ujsds/vol1-iss2/29","url":null,"abstract":"Data mining is the process of extracting and searching for useful knowledge and information using certain algorithms or methods according to knowledge or information. The data mining classification methods used in this study are Naïve Bayes and K-Nearest Neighbor. By using the Naïve Bayes and K-Nearest Neighbor methods, it is possible to classify the DKI Jakarta air pollution standard index in 2021 based on six air pollutants, namely dust particles (PM10), dust particles (PM2.5), sulfur dioxide (SO2), carbon monoxide. (CO), ozone (O3) and nitrogen dioxide (NO2). The test was carried out to determine the accuracy in predicting the DKI Jakarta air pollution standard index in 2021 using the confusion matrix evaluation value. So that the best performance of the two methods is found in the Naïve Bayes algorithm with high Naïve Bayes sensitivity values for all categories even though there are data in minority or unbalanced categories, and the frequency of data from each category or in this case the data is not balanced, the Naïve Bayes algorithm shows good performance in accuracy, sensitivity, specificity.","PeriodicalId":220933,"journal":{"name":"UNP Journal of Statistics and Data Science","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125250819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}