{"title":"从文本ETL到多维查询的文档仓库系统生命周期方法:一个概念验证原型","authors":"A. Cembalo, F. M. Pisano, G. Romano","doi":"10.1109/CISIS.2012.185","DOIUrl":null,"url":null,"abstract":"For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that \"about 80% of the information of any organization is contained in unstructured and semi-structured documents\"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.","PeriodicalId":158978,"journal":{"name":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype\",\"authors\":\"A. Cembalo, F. M. Pisano, G. Romano\",\"doi\":\"10.1109/CISIS.2012.185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that \\\"about 80% of the information of any organization is contained in unstructured and semi-structured documents\\\"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.\",\"PeriodicalId\":158978,\"journal\":{\"name\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"volume\":\"429 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISIS.2012.185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2012.185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype
For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that "about 80% of the information of any organization is contained in unstructured and semi-structured documents"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.