Algebraic operator support for semantic data fusion in extended SQL

S. Hosain, H. Jamil
{"title":"Algebraic operator support for semantic data fusion in extended SQL","authors":"S. Hosain, H. Jamil","doi":"10.1109/UKRICIS.2010.5898129","DOIUrl":null,"url":null,"abstract":"One of the basic operations required to gather more information about an object is called information aggregation or data fusion. The process requires recognition of a semantic object and gathering the new information into the collection that already exists for that object. Another related operation is collecting a set of distinct semantic objects that are similar. These operations become complicated in the presence of schema and extent heterogeneity and semantic similarity. Although a rich body of research addressed these issues in the literature, a database language support is yet available possibly because an algebraic formulation of these concepts was absent. An algebraic characterization is needed for query plan generation, optimization and query processing. In this paper, we propose two new binary operators called link (λ) and combine (χ) that capture the spirit of vertical and horizontal data fusion. The proposed operators leverage the development in schema matching and key identification technologies by casting them as user selectable functions μ and κ. We show that link and combine are generalized versions of traditional join and union operations. We also propose two extensions of SQL that exploits these two operators and opens up many optimization possibilities. We also point out that link and combine are also useful for semantic data integration and are currently being used in LifeDB data management system for Life Sciences applications.","PeriodicalId":359942,"journal":{"name":"2010 IEEE 9th International Conference on Cyberntic Intelligent Systems","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 9th International Conference on Cyberntic Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKRICIS.2010.5898129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

One of the basic operations required to gather more information about an object is called information aggregation or data fusion. The process requires recognition of a semantic object and gathering the new information into the collection that already exists for that object. Another related operation is collecting a set of distinct semantic objects that are similar. These operations become complicated in the presence of schema and extent heterogeneity and semantic similarity. Although a rich body of research addressed these issues in the literature, a database language support is yet available possibly because an algebraic formulation of these concepts was absent. An algebraic characterization is needed for query plan generation, optimization and query processing. In this paper, we propose two new binary operators called link (λ) and combine (χ) that capture the spirit of vertical and horizontal data fusion. The proposed operators leverage the development in schema matching and key identification technologies by casting them as user selectable functions μ and κ. We show that link and combine are generalized versions of traditional join and union operations. We also propose two extensions of SQL that exploits these two operators and opens up many optimization possibilities. We also point out that link and combine are also useful for semantic data integration and are currently being used in LifeDB data management system for Life Sciences applications.
扩展SQL中语义数据融合的代数运算符支持
收集有关对象的更多信息所需的基本操作之一称为信息聚合或数据融合。该过程需要识别语义对象,并将新信息收集到该对象已经存在的集合中。另一个相关操作是收集一组不同的相似语义对象。在模式和范围异构和语义相似的情况下,这些操作变得复杂。虽然在文献中有大量的研究解决了这些问题,但是数据库语言的支持仍然是可用的,这可能是因为这些概念的代数表述是缺失的。查询计划的生成、优化和查询处理都需要代数表征。在本文中,我们提出了两个新的二元算子link (λ)和combine (χ),它们抓住了垂直和水平数据融合的精神。所提出的运算符利用模式匹配和密钥识别技术的发展,将它们转换为用户可选择的函数μ和κ。我们证明了链接和组合是传统连接和联合操作的广义版本。我们还提出了两个SQL扩展,它们利用了这两个操作符,并提供了许多优化可能性。我们还指出,链接和组合对于语义数据集成也很有用,目前正在生命科学应用的LifeDB数据管理系统中使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信