N. Zendrato, M. Zarlis, O. S. Sitompul, E. M. Zamzami
{"title":"Forecasting Acceleration of Data Transfer with Fog Computing for Resource Efficiency in Data Centers","authors":"N. Zendrato, M. Zarlis, O. S. Sitompul, E. M. Zamzami","doi":"10.1109/DATABIA50434.2020.9190326","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190326","url":null,"abstract":"Accelerate of data transfer always be a problem in fog computing especially workload datacenter This research predicts server performance data on fog computing using linear regression methods. Predictions are made on variables that affect the speed of data transfer namely the number of CPU cores, CPU capacity, memory used based on this variable is used as an attribute and data transfer as a label. With this research the performance of data transfer speeds can be predicted before use. This method provides an improvement in the error value compared of other forecasting methods Thus the process of data transfer in fog computing can be more effective and efficient","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117321978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan
{"title":"Comparative Analysis of the Kruskal and Boruvka Algorithms in Solving Minimum Spanning Tree on Complete Graph","authors":"D. Rachmawati, Herriyance, Frederik Yan Putra Pakpahan","doi":"10.1109/DATABIA50434.2020.9190504","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190504","url":null,"abstract":"The problem that is often encountered in daily life is connecting all points in one work domain with a low optimization value, for example, the most economical cost required to connect a water pipe to each house in an area. To solve this problem, a system that can find a path that connects all points in one work domain with the lowest optimization is needed. In this study, the system was built using two algorithms, namely, Kruskal and Boruvka algorithms, and a complete graph is used as a modeling of the problem. Using these two algorithms, the system will find the optimum path that connects all points in the complete graph; then, the system also displays a comparison between the two algorithms in finding the optimum route. The data used is dynamic, meaning the users can enter and change the value of the side of the complete graph as needed. From the tests that have been done, it is found that the Kruskal algorithm is more effective than the Boruvka to find the minimum spanning tree in a complete graph with some nodes, and sides are 15 points and 105 sides.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127906973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Emotion Analysis and Classification of Movie Reviews Using Data Mining","authors":"Kamoltep Moolthaisong, Wararat Songpan","doi":"10.1109/DATABIA50434.2020.9190363","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190363","url":null,"abstract":"This paper proposes a model for classification of movie reviews by using Data Mining. The paper also proposes the method of creating word cloud from word frequency in movie reviews, for the purpose of partially helping in analyzing for interested topic and opinion of reviewer. The research uses movie review data from Metacritic website. The review data consists of reviews from 21 movies, separated into two parts to be used as training set and test set. Training set have 462 reviews and test set have 238 reviews. The data preparation process started collecting review data by removing special symbols case and preprocessing into Weka program. Change the review text into structured data by using StringToWordVector filter. This process includes removing stop words with Rainbow stop words list, change word that have the same root origin into word stem by using Snowball Stemmer algorithm and then given weight value by using TF-IDF technique. After that, Naïve bayes, Random Forest and J48 algorithms were used to classify the review data into positive and negative groups. The experimental result given is 80.25%, 79.83% and 68.06%, respectively.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116415323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification","authors":"A. Amalia, O. S. Sitompul, E. Nababan, T. Mantoro","doi":"10.1109/DATABIA50434.2020.9190447","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190447","url":null,"abstract":"Text classification using a simple word representation with a linear classifier often considered as strong baselines to gain the best performances. However, a simple word representation like Bag of Word (BOW) has a deficiency of curse dimensionality, so it is only suitable for small datasets. BOW also needs some dependent pre-processing steps like stopwords-removal and stemming. Therefore, the BOW model cannot be implemented automatically because of the dependency in a specific language. On the other hand, deep neural network classifiers can eliminate the pre-processing prerequisite, but this model not efficient in time processing and need a large dataset for the learning process. It becomes a challenge for language that has limitation resources like Bahasa Indonesia. Another novel approach of text classifier is using the fastText model for text classification. This model can minimize pre-processing dependencies and more efficient in training time processing. However, there hasn't been much observation whether the fastText model outperformed the BOW model for small datasets. This paper aims to compare text classification using the TFIDF model as one of the BOW models with a fastText model for 500 news articles in Bahasa Indonesia. The result of this study showed both models gain an outstanding performance, which is 0.97 F-Score. The TFIDF model needs longer pre-processing stages and requiring more training time. Meanwhile, the fastText model only needs to tune some hyperparameters and get similar performance results to the TFIDF model. Based on this study, we can conclude that the fastText model is efficient text classification.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123637166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dijkstra's and A-Star in Finding the Shortest Path: a Tutorial","authors":"Ade Candra, M. A. Budiman, Kevin Hartanto","doi":"10.1109/DATABIA50434.2020.9190342","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190342","url":null,"abstract":"As one form of the greedy algorithm, Dijkstra's can handle the shortest path search with optimum result in longer search time. Dijkstra's is contrary to A-Star, a best-first search algorithm, which can handle the shortest path search with a faster time but not always optimum. By looking at the advantages and disadvantages of Dijkstra's and A-Star, this tutorial discusses the implementation of the two algorithms in finding the shortest path in routes selection between 24 SPBU (gas stations). The routes are located in Medan City and represented in a directed graph. Moreover, the authors compare Dijkstra's and A-star based on the complexity of Big-Theta (Θ) and running time. The results show that the shortest path search between SPBU can be solved with Dijkstra's and A-Star, where in some cases, the routes produced by the two algorithms are different so that the total distance generated is also different. In this case, the running time of A-Star is proven to be faster than Dijkstra's, and it is following A-Star principle which selects the location point based on the best heuristic value while Dijkstra's does not. For the complexity, Dijkstra's is $Theta(mathrm{n}^{2})$ and A-Star is $Theta(mathrm{m}ast mathrm{n})$, where $0leq mathrm{m}leq mathrm{n}$.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117278955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Use of Meteorology Data in Short-Term Prediction of Wind Speed for Wind Turbine Using Elman Recurrent Neural Network","authors":"R. Dinzi, Muhammad Yusuf, F. Fahmi","doi":"10.1109/DATABIA50434.2020.9190628","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190628","url":null,"abstract":"Wind energy is one of the promising renewable energy sources that are ideal for daily use, especially in the area with sufficient wind blows like Indonesia. Wind speed caused by wind energy is a driving force for wind turbines to produce electrical power. One problem in wind turbine management is to predict the speed of the wind in the short term for efficiency. In this research, forecasting of short-term wind speed was done in the city of Sibolga by uses an Elman recurrent neural network based on meteorological data: temperature, humidity, and air pressure to predict over the next ten days. Four prediction models were developed for this purpose based on training parameters and dataset used. The wind speed forecasting produces MAPE error values of 20.02% in the first model, 23.31% in the second model, 18.15% in the third model, and 12.51% in the fourth model. The fourth model was capable of predicting with the lowest error and, therefore, considered to be useful for wind turbine management.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121146977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accuracy Analysis on Images Retrieval System using Radial Basis Function Algorithm and Coefficient Correlation","authors":"Khairul Abdi Sinuraya, S. Suwilo, M. S. Lydia","doi":"10.1109/DATABIA50434.2020.9190227","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190227","url":null,"abstract":"The image retrieval system is a system used for the process of retrieval of images based on information contained in the image files. Radial Basis Function (RBF) is one of the Neural Network methods used in the image retrieval system, is known for the capability to produce image information search properly. In determining the initial centroid value, the RBF method uses K-Means Clustering. This algorithm has a weakness in determining the right initial centroid value to get proper classification results in image retrieval. In this paper, the Coefficient Correlation (CC) method is used in determining the initial centroid value of the input data following the similarity of the data. Data with the highest degree of similarity compared to other data used as the initial centroid value. Data used in this study are leaf image data of 500 images with 10 categories of leaf types, and each sample contained 50 images. Based on the testing results, an increase in image retrieval accuracy with an average of 90.92% using the RBF and CC methods compared the image retrieval results using the RBF and K-Means Clustering methods gained an average accuracy of 85.96%.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132283848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing Cosine Similarity Algorithm to Increase the Flexibility of Hematology Text Report Generation","authors":"Aulia Amirullah, I. Aulia, Dedy Arisandy","doi":"10.1109/DATABIA50434.2020.9190549","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190549","url":null,"abstract":"The previous hematology textual summary representation system, which applies template based method of Natural Language Generation to produce hematology laboratory test results in natural language representation, was at the cutting edge to generate more detailed hematology reports. The produced reports manage to provide texts which break down the critical components and abnormal components of blood found in conventional hematology test results. The produced reports in natural language representation aimed to help patients to easily define, spot and point out which blood components are acting up. Templates provide slots to generate every single sentence to be replaced by the data that we provide. However, the previous system is only able to produce fixed unflexible slots of blood components which are defined by the system, named T-Gen System. It nearly got off the ground as it is very unflexible because the produced templates cannot hold all of both critical and abnormal components found in a produced laboratory examination result. Therefore, this research project implements cosine similarity algorithm to expand template flexibility. Testing and evaluation were carried out manually by examining given components into the system which will be added consecutively. The testing shows that every blood component which was added consecutively succesfully appeared in the produced texts.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115190277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey on The Accuracy of Machine Learning Techniques for Intrusion and Anomaly Detection on Public Data Sets","authors":"R. T. Adek, M. Ula","doi":"10.1109/DATABIA50434.2020.9190436","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190436","url":null,"abstract":"Machine learning (ML) is growing popularity due to their ability to solve the problem in many areas. In digital world including information security, some intrusion detection systems (IDS) are being upgraded with Machine Learning elements for improving the performance of the system. It is known that is very limited real data set available for information security (IS) research. Therefore, many IS researches relies on the public data set. However public data set have many limitations. The aim of this paper is to analyze the accuracy and performance of the Machine Learning in intrusion detection system and to highlight some recommendation for future research. This study involves an academic papers systematic literature review on intrusion detection related to the application of machine learning methods using public data set. This paper elaborates the used of Machine Learning algorithms in intrusion detection system, highlighting the accuracy and the limitations of the methods for detecting attackers. The goal of this research is to provide an academic base for future research in the adoption of machine learning methods for IDS.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121932416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Analysis of FIFO and Round Robin Scheduling Process Algorithm in IoT Operating System for Collecting Landslide Data","authors":"Hayatunnufus, M. Riasetiawan, A. Ashari","doi":"10.1109/DATABIA50434.2020.9190608","DOIUrl":"https://doi.org/10.1109/DATABIA50434.2020.9190608","url":null,"abstract":"Scheduling is one of the most important factors used in scheduling processes insideCPU. CPU scheduling is a concept of multiprogramming, where the CPU is used to schedule the incoming processes alternately. Many algorithms can be used to schedule processesinside CPU, but not all can be real-time. Long waiting times and response times often be problems in scheduling processes in realtime. FIFO and Round Robin algorithms can be implemented to schedule the processes in realtime. In this paper, the authors schedule the process of several sensors that are used to collect landslide data. The data processing is sent by FIFO scheduling and Round Robin scheduling separately on the Internet of Things (IoT) device. The author only analyzes the performance of FIFO and Round Robin algorithms in scheduling the incoming processes in real-time on the IoT operating system by considering the waiting time and response time. The analysis is expected to create a quick response time and waiting time so that proper algorithm is decided to complement the IoT architecture in landslide detection. The FIFO and Round Robin algorithms are implementedat the Raspbian and Arch Linux operating systems in the IoT device, the Raspberry Pi 3 Model B, which uses a 64-bit 64-bit ARM Cortex-AS3 64-bit process at 1.2GHz.","PeriodicalId":165106,"journal":{"name":"2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics (DATABIA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126642261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}