{"title":"Simultaneous Feature Selection and Tuple Selection for Efficient Classification","authors":"M. Dash, V. Gopalkrishnan","doi":"10.4018/978-1-60566-748-5.CH012","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH012","url":null,"abstract":"It is no longer news that data are increasing very rapidly day-by-day. Particularly with Internet becoming so prevalent everywhere, the sources of data have become numerous. Data are increasing in both ways: dimensions or features and instances or examples or tuples, not all the data are relevant though. While gathering the data on any particular aspect, usually one tends to gather as much information as will be required for various tasks. One may not explicitly have any particular task, for example classification, in mind. So, it behooves for a data mining expert to remove the noisy, irrelevant and redundant data before proceeding with classification because many traditional algorithms fail in the presence of such noisy and irrelevant data (Blum and Langley 1997). As an example, consider microarray gene expression data where there are thousands of features (or genes) and only 10s of tuples (or sample tests). For example, Leukemia cancer data (Alon, Barkai et al. 1999) has 7129 genes and 72 sample tests. It has been shown that even with very few genes one can achieve the same or even better prediction acABStrAct","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115780590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decisional Annotations","authors":"G. Cabanac, M. Chevalier, F. Ravat, O. Teste","doi":"10.4018/978-1-60566-748-5.CH004","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH004","url":null,"abstract":"This chapter deals with an annotation-based decisional system. The decisional system we present is based on multidimensional databases, which are composed of facts and dimensions. The expertise of decision-makers is modelled, shared and stored through annotations. These annotations allow decision-makers to carry on active analysis and to collaborate with other decision-makers on a common analysis.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124569071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Svetlana Mansmann, T. Neumuth, O. Burgert, Matthias Röger
{"title":"Conceptual Data Warehouse Design Methodology for Business Process Intelligence","authors":"Svetlana Mansmann, T. Neumuth, O. Burgert, Matthias Röger","doi":"10.4018/978-1-60566-748-5.CH007","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH007","url":null,"abstract":"129 The emerging area of business process intelligence aims at enhancing the analysis power of business process management systems by employing performance-oriented technologies of data warehousing and mining. However, the differences in the assumptions and objectives of the underlying models, namely the business process model and the multidimensional data model, aggravate straightforward and meaningful convergence of the two concepts. The authors present an approach to designing a data warehousingfor enabling the multidimensional analysis of business processes and their execution. The aims of such analysis are manifold, from quantitative and qualitative assessment to process discovery, pattern recognition and mining. The authors demonstrate that business processes and workflows represent a non-conventional application scenario for the data warehousing approach and that multiple challenges arise at various design stages. They describe deficiencies of the conventional OLAP technology with respect to business process modeling andformulate the requirements for an adequate multidimensional presentation of process descriptions. Modeling extensions proposed at the conceptual level are verified by implementing them in a relational OLAP system, accessible via state-of the-art visualfrontend tools. The authors demonstrate the benefits of the proposed modelingframework by presenting relevant analysis tasks from the domain of medical engineering and showing the type of the decision support provided by our solution.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117255707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Workload for Schema Evolution in Data Warehouses","authors":"F. Bentayeb, Cécile Favre, Omar Boussaïd","doi":"10.4018/978-1-60566-748-5.CH002","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH002","url":null,"abstract":"A data warehouse allows the integration of heterogeneous data sources for identified analysis purposes. The data warehouse schema is designed according to the available data sources and the users' analysis requirements. In order to provide an answer to new individual analysis needs, we previously proposed, in recent work, a solution for on-line analysis personalization. We based our solution on a user-driven approach for data warehouse schema evolution which consists in creating new hierarchy levels in OLAP (On-Line Analytical Processing) dimensions. One of the main objectives of OLAP, as the meaning of the acronym refers, is the performance during the analysis process. Since data warehouses contain a large volume of data, answering decision queries efficiently requires particular access methods. The main issue is to use redundant optimization structures such as views and indices. This implies to select an appropriate set of materialized views and indices, which minimizes total query response time, given a limited storage space. A judicious choice in this selection must be cost-driven and based on a workload which represents a set of users' queries on the data warehouse. In this chapter, we address the issues related to the workload’s evolution and maintenance in data warehouse systems in response to new requirements modeling resulting from users’ personalized analysis needs. The main issue is to avoid the workload generation from scratch. Hence, we propose a workload management system which helps the administrator to maintain and adapt dynamically the workload according to changes arising on the data warehouse schema. To achieve this maintenance, we propose two types of workload updates: (1) maintaining existing queries consistent with respect to the new data warehouse schema and (2) creating new queries based on the new dimension hierarchy levels. Our system helps the administrator in adopting a pro-active behaviour in the management of the data warehouse performance. In order to validate our workload management system, we address the implementation issues of our proposed prototype. This latter has been developed within client/server architecture with a web client interfaced with the Oracle 10g DataBase Management System.","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128780952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Warehouse Facilitating Evidence-Based Medicine","authors":"N. Stolba, Tho Manh Nguyen, A. Tjoa","doi":"10.4018/978-1-60566-748-5.CH008","DOIUrl":"https://doi.org/10.4018/978-1-60566-748-5.CH008","url":null,"abstract":"In the past, much effort of healthcare decision support systems were focused on the data acquisition and storage, in order to allow the use of this data at some later point in time. Medical data was used in static manner, for analytical purposes, in order to verify the undertaken decisions. Due to the immense volumes of medical data, the architecture of the future healthcare decision support systems focus more on interoperability than on integration. With the raising need for the creation of unified knowledge base, the federated approach to distributed data warehouses (DWH) is getting increasing attention. The exploitation of evidence-based guidelines becomes a priority concern, as the awareness of the importance of knowledge management rises. Consequently, interoperability between medical information systems is becoming a necessity in modern health care. Under strong security measures, health care organizations are striking to unite and share their (partly very high sensitive) data assets in order to achieve a wider knowledge base and to provide a matured decision support service for the decision makers. Ontological integration of the very complex and heterogeneous medical data structures is a challenging task. The authors’ objective is to point out the advantages of the deployment of a federated data warehouse approach for the integration of the wide range of different medical data sources and for distribution of evidence-based clinical knowledge, to support clinical decision makers, primarily clinicians at the point of care. DOI: 10.4018/978-1-60566-748-5.ch008","PeriodicalId":255230,"journal":{"name":"Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130628093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}