数据库转换构建数据挖掘分析的数据集综述

2015 International Conference on Computing Communication Control and Automation Pub Date : 2015-02-26 DOI:10.1109/ICCUBEA.2015.81

A. Chaudhari, H. Khanuja

{"title":"数据库转换构建数据挖掘分析的数据集综述","authors":"A. Chaudhari, H. Khanuja","doi":"10.1109/ICCUBEA.2015.81","DOIUrl":null,"url":null,"abstract":"In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.","PeriodicalId":325841,"journal":{"name":"2015 International Conference on Computing Communication Control and Automation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Database Transformation to Build Data-Set for Data Mining Analysis - A Review\",\"authors\":\"A. Chaudhari, H. Khanuja\",\"doi\":\"10.1109/ICCUBEA.2015.81\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.\",\"PeriodicalId\":325841,\"journal\":{\"name\":\"2015 International Conference on Computing Communication Control and Automation\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Computing Communication Control and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCUBEA.2015.81\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Computing Communication Control and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCUBEA.2015.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在数据挖掘项目中，最耗时的任务是从关系数据库中准备适合分析的规范化数据集。通常，数据库有许多表和视图的集合，为了构建所需的数据集，必须对这些表和视图进行连接、聚合和转换。它导致大多数长而复杂的SQL查询以无组织的方式独立地编写多次。因此，数据库随着许多表和视图的增长而增长，而这些表和视图在ER模型中并不是作为实体出现的，类似的SQL查询被多次编写，这使得数据库管理、软件开发和维护变得复杂。在本文中，我们提出了一些简单的方法，这些方法生成SQL代码，以水平表格布局返回聚合列，其中每行对应一个观察值，每列与维度分离。这类新的函数被称为水平聚合。水平聚合构建数据集，这是大多数数据挖掘算法所需的标准规范化布局。我们介绍了评估数据库转换的三种基本方法:SPJ:基于标准关系代数运算符(SPJ查询)，CASE:使用SQL中可用的CASE编程构造，PIVOT:使用PIVOT运算符，这是商业DBMS中的内置运算符。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Database Transformation to Build Data-Set for Data Mining Analysis - A Review

In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Computing Communication Control and Automation

自引率

0.00%

发文量