{"title":"A global and comprehensive approach for XML data warehouse design","authors":"Zoubir Ouaret, Omar Boussaïd, R. Chalal","doi":"10.1109/AICCSA.2014.7073251","DOIUrl":null,"url":null,"abstract":"The increasing amounts of interesting data stored in the XML format is the most challenging issue for BI community, thus it is desirable to successfully extract, store and integrate this large sources of information special purpose systems called “data warehouse” for further analysis and decision-making. However, compared with the well structured relational databases of a company, XML data presents a complex hierarchical structure, which renders inappropriate, existing traditional data warehouse approaches and techniques. In this paper, we propose a semi-automatic approach for XML data warehouse design starting from XML schemas as data sources. The first step consists in automatically generating the UML Class diagram from W3C XML Schema (XSD). However, the obtained diagram can be very large and hard to understand. To overcome this situation, we use a set of rules based on basic techniques for object oriented design quality to develop a simplification algorithm that efficiently generates high-quality diagrams with limited number of classes. Then, we propose a multi-dimensional (MD) element extraction algorithm to automatically identify facts, measures and their corresponding dimensions. We also present a new metric for ranking obtained MD schemas according to their relevance. The final step consists in automatically generating the star XML schema that corresponds to the XML Data warehouse schema. Finally, we have implemented our approach using JAVA and we have evaluated this tool on several XML schemas.","PeriodicalId":412749,"journal":{"name":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2014.7073251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The increasing amounts of interesting data stored in the XML format is the most challenging issue for BI community, thus it is desirable to successfully extract, store and integrate this large sources of information special purpose systems called “data warehouse” for further analysis and decision-making. However, compared with the well structured relational databases of a company, XML data presents a complex hierarchical structure, which renders inappropriate, existing traditional data warehouse approaches and techniques. In this paper, we propose a semi-automatic approach for XML data warehouse design starting from XML schemas as data sources. The first step consists in automatically generating the UML Class diagram from W3C XML Schema (XSD). However, the obtained diagram can be very large and hard to understand. To overcome this situation, we use a set of rules based on basic techniques for object oriented design quality to develop a simplification algorithm that efficiently generates high-quality diagrams with limited number of classes. Then, we propose a multi-dimensional (MD) element extraction algorithm to automatically identify facts, measures and their corresponding dimensions. We also present a new metric for ranking obtained MD schemas according to their relevance. The final step consists in automatically generating the star XML schema that corresponds to the XML Data warehouse schema. Finally, we have implemented our approach using JAVA and we have evaluated this tool on several XML schemas.
以XML格式存储的有趣数据的数量不断增加是BI社区面临的最具挑战性的问题,因此需要成功地提取、存储和集成这些大型信息源(称为“数据仓库”的专用系统),以便进行进一步的分析和决策。然而,与公司结构良好的关系数据库相比,XML数据呈现出复杂的层次结构,这使得现有的传统数据仓库方法和技术变得不合适。在本文中,我们提出了一种从XML模式作为数据源开始的XML数据仓库设计的半自动方法。第一步包括从W3C XML Schema (XSD)自动生成UML类图。然而,获得的图表可能非常大,难以理解。为了克服这种情况,我们使用一组基于面向对象设计质量的基本技术的规则来开发一种简化算法,该算法可以有效地生成具有有限数量类的高质量图。然后,我们提出了一种多维元素提取算法来自动识别事实、度量及其对应的维度。我们还提出了一种根据相关性对获得的MD模式进行排名的新度量。最后一步是自动生成与XML数据仓库模式对应的星型XML模式。最后,我们使用JAVA实现了我们的方法,并在几个XML模式上对该工具进行了评估。