{"title":"数据库转换构建数据挖掘分析的数据集综述","authors":"A. Chaudhari, H. Khanuja","doi":"10.1109/ICCUBEA.2015.81","DOIUrl":null,"url":null,"abstract":"In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.","PeriodicalId":325841,"journal":{"name":"2015 International Conference on Computing Communication Control and Automation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Database Transformation to Build Data-Set for Data Mining Analysis - A Review\",\"authors\":\"A. Chaudhari, H. Khanuja\",\"doi\":\"10.1109/ICCUBEA.2015.81\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.\",\"PeriodicalId\":325841,\"journal\":{\"name\":\"2015 International Conference on Computing Communication Control and Automation\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Computing Communication Control and Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCUBEA.2015.81\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Computing Communication Control and Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCUBEA.2015.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Database Transformation to Build Data-Set for Data Mining Analysis - A Review
In Data Mining Project, the most time consuming task is to prepare a normalized data-set from relational database, which is suitable for analysis. In general the database has collection of many tables and views that must be joined, aggregated and transformed in order to build the required dataset. It results most long, complex SQL queries written multiple times independently and in disorganized manner. Therefore, the database grows with many tables and views that are not present as entities in the ER model and similar SQL queries are written multiple times, which complicate the database management, software development and maintenance. In this paper, we propose simple methods which generate SQL code to return aggregated columns in a horizontal tabular layout, where every row corresponds to an observation and every column disassociated to dimension. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets which is the standard normalized layout required by most data mining algorithms. We introduce three fundamental methods to evaluate database transformation: SPJ: Based on standard relational algebra operators (SPJ queries), CASE: Using CASE programming construct available in SQL, PIVOT: Using the PIVOT operator, which is a built-in operator in a commercial DBMS.