从文本ETL到多维查询的文档仓库系统生命周期方法:一个概念验证原型

A. Cembalo, F. M. Pisano, G. Romano
{"title":"从文本ETL到多维查询的文档仓库系统生命周期方法:一个概念验证原型","authors":"A. Cembalo, F. M. Pisano, G. Romano","doi":"10.1109/CISIS.2012.185","DOIUrl":null,"url":null,"abstract":"For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that \"about 80% of the information of any organization is contained in unstructured and semi-structured documents\"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.","PeriodicalId":158978,"journal":{"name":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","volume":"429 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype\",\"authors\":\"A. Cembalo, F. M. Pisano, G. Romano\",\"doi\":\"10.1109/CISIS.2012.185\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that \\\"about 80% of the information of any organization is contained in unstructured and semi-structured documents\\\"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.\",\"PeriodicalId\":158978,\"journal\":{\"name\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"volume\":\"429 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CISIS.2012.185\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIS.2012.185","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

多年来,商人利用ad-hoc技术来分析与感兴趣领域相关的大量数据,旨在提取相关信息以制定成功的公司战略。这些技术主要关注结构化数据。特别是数据仓库系统代表了学术界和工业界关注的决策支持系统。据认为,“任何组织大约80%的信息都包含在非结构化和半结构化文档中”[1],因此,像目前所做的那样,将分析仅限于结构化数据,可能会失去很大比例的潜在有用知识。由于文本是传播信息和知识的主要手段,因此有必要介绍与面向文本的业务智能和文档仓库系统相关的概念,这些概念可能在工业或大型领域中有许多有用的应用。在本文中,我们提出了一个文档仓库系统的原型应用程序,突出了其生命周期的每个阶段的挑战和解决方案。该原型与安全防护领域相关,使用一组开源工具构建,突出了这些工具的功能和局限性。正如我们目前所知,Document Warehouse系统生命周期的基本元素的组织和设置是尚未深入研究的问题。此外,到目前为止,我们还没有找到一个文档仓库的应用程序,它已经集成了我们用来实现原型的开源工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Approach to Document Warehousing System Lifecyle from Textual ETL to Multidimensional Queries: A Proof-of-Concept Prototype
For years, businessmen made use of ad-hoc technologies in order to analyze huge amount of data related to the domain of interest, aiming at extracting relevant information to elaborate successful company strategies. Such technologies focused essentially on the structured data. In particular Data Warehousing systems represent the decision support systems on which academia and industry focused their attention. It is believed that "about 80% of the information of any organization is contained in unstructured and semi-structured documents"[1], so limiting the analysis to only the structured data, as it has been done so far, is likely to lose a high percentage of potentially useful knowledge. Since text is the primary mean to disseminate information and knowledge, it is necessary to introduce concepts related to text-oriented Business Intelligent and Document Warehousing systems, which could have many useful applications in industries or large domains. In this paper we present a prototype application of a Document Warehousing system, highlighting challenges and solutions for each phase of its lifecycle. The prototype is related to Security and Prevention domain and it is built with a set of open-source tools whose features and limitations are highlighted. As we currently know, organization and setting of the fundamental elements of a Document Warehouse system lifecycle, are issues which have not been deepened yet. Furthermore until now, we have not find an application of Document Warehousing, which has been implemented integrating the open-source tools which we use to implement our prototype yet.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信