Basic Knowledge Construction Technique to Reduce The Volume of Low-Dimensional Big Data

2020 Fifth International Conference on Informatics and Computing (ICIC) Pub Date : 2020-11-03 DOI:10.1109/ICIC50835.2020.9288550

G. Karya, B. Sitohang, Saiful Akbar, V. Moertini

{"title":"Basic Knowledge Construction Technique to Reduce The Volume of Low-Dimensional Big Data","authors":"G. Karya, B. Sitohang, Saiful Akbar, V. Moertini","doi":"10.1109/ICIC50835.2020.9288550","DOIUrl":null,"url":null,"abstract":"Big-data has the characteristics of high volume, velocity, and variety (3v) and continues to grow exponentially following the development of the use of world information and communication technology. The main problem in the use of big data is data deluge. The need for technology and big-data storage and processing methods to offset the exponential data growth rate is potentially unlimited, giving rise to the problem of increasing exponential technology requirements as well. In this paper, we propose a new approach in the realm of big-data analysis, through separating the basic-knowledge construction process from the original data into knowledge with much smaller velocity and volume. There are three problems to be solved, such as formulating basic-knowledge, developing a method for constructing basic-knowledge from initial data, and developing a technique for analyzing basic-knowledge into final knowledge. In this study, the technique used to build basic-knowledge is clustering-based. Analysis of basic-knowledge into final-knowledge is limited to the clustering-based analysis process. The main contributions in this paper are basic-knowledge formulation, new big-data analytic architecture, basic-knowledge construction algorithms (DSC4BKC), and analysis algorithms from basic-knowledge (BDAfBK) to final-knowledge. To test our proposed method, we use the BIRCH clustering algorithm with O(n) complexity as the baseline. We also used the artificial test-data generated from WEKA, and the IRIS4D and Diabetes data from the UCI Machine Learning Data Set for validation. Our test shows that the proposed method much more efficient in using data storage (84.69% up to 99.80%), faster in processing (20.84% up to 86.91%, and produces final-knowledge that is similar to the baseline.","PeriodicalId":413610,"journal":{"name":"2020 Fifth International Conference on Informatics and Computing (ICIC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Fifth International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIC50835.2020.9288550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Big-data has the characteristics of high volume, velocity, and variety (3v) and continues to grow exponentially following the development of the use of world information and communication technology. The main problem in the use of big data is data deluge. The need for technology and big-data storage and processing methods to offset the exponential data growth rate is potentially unlimited, giving rise to the problem of increasing exponential technology requirements as well. In this paper, we propose a new approach in the realm of big-data analysis, through separating the basic-knowledge construction process from the original data into knowledge with much smaller velocity and volume. There are three problems to be solved, such as formulating basic-knowledge, developing a method for constructing basic-knowledge from initial data, and developing a technique for analyzing basic-knowledge into final knowledge. In this study, the technique used to build basic-knowledge is clustering-based. Analysis of basic-knowledge into final-knowledge is limited to the clustering-based analysis process. The main contributions in this paper are basic-knowledge formulation, new big-data analytic architecture, basic-knowledge construction algorithms (DSC4BKC), and analysis algorithms from basic-knowledge (BDAfBK) to final-knowledge. To test our proposed method, we use the BIRCH clustering algorithm with O(n) complexity as the baseline. We also used the artificial test-data generated from WEKA, and the IRIS4D and Diabetes data from the UCI Machine Learning Data Set for validation. Our test shows that the proposed method much more efficient in using data storage (84.69% up to 99.80%), faster in processing (20.84% up to 86.91%, and produces final-knowledge that is similar to the baseline.

查看原文本刊更多论文

减少低维大数据量的基础知识构建技术

大数据具有容量大、速度快、种类多(3v)的特点，并随着世界信息通信技术的发展而持续呈指数级增长。大数据使用的主要问题是数据泛滥。为了抵消指数级的数据增长速度，对技术和大数据存储和处理方法的需求可能是无限的，这也带来了指数级增长的技术需求的问题。在本文中，我们提出了一种新的大数据分析方法，将基础知识的构建过程从原始数据中分离出来，以更小的速度和更小的体积生成知识。需要解决的问题有三个:基础知识的表述、从初始数据构建基础知识的方法、基础知识转化为最终知识的分析技术。在本研究中，使用基于聚类的技术来构建基础知识。将基础知识转化为最终知识的分析仅限于基于聚类的分析过程。本文的主要贡献是基础知识的表述、新的大数据分析架构、基础知识构建算法(DSC4BKC)以及从基础知识到最终知识的分析算法(BDAfBK)。为了测试我们提出的方法，我们使用复杂度为0 (n)的BIRCH聚类算法作为基线。我们还使用了来自WEKA的人工测试数据，以及来自UCI机器学习数据集的IRIS4D和糖尿病数据进行验证。我们的测试表明，所提出的方法在使用数据存储方面效率更高(从84.69%提高到99.80%)，处理速度更快(从20.84%提高到86.91%)，并产生与基线相似的最终知识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 Fifth International Conference on Informatics and Computing (ICIC)

自引率

0.00%

发文量