异构数据源集成的语境化语言匹配

Y. B. Idrissi, J. Vachon
{"title":"异构数据源集成的语境化语言匹配","authors":"Y. B. Idrissi, J. Vachon","doi":"10.1109/MCETECH.2008.33","DOIUrl":null,"url":null,"abstract":"As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.","PeriodicalId":299458,"journal":{"name":"2008 International MCETECH Conference on e-Technologies (mcetech 2008)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Contextualized Linguistic Matching for Heterogeneous Data Source Integration\",\"authors\":\"Y. B. Idrissi, J. Vachon\",\"doi\":\"10.1109/MCETECH.2008.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.\",\"PeriodicalId\":299458,\"journal\":{\"name\":\"2008 International MCETECH Conference on e-Technologies (mcetech 2008)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International MCETECH Conference on e-Technologies (mcetech 2008)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCETECH.2008.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International MCETECH Conference on e-Technologies (mcetech 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCETECH.2008.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

可以预见,共同市场的成员可能在完全不同的领域工作,并使用不同类型的模式来描述他们自己的业务数据。理想的情况是允许将这些异构数据集成到某个(具体的或虚拟的)业务数据存储库中。认为所有加入公共市场的成员都可以直接提供遵循某些预定义的联邦数据方案(例如标准本体)的数据,这似乎是不切实际的。通常需要更大的灵活性。实际上,新数据源的集成需要一个映射步骤,允许计算要合并的各种模式概念之间的语义等价。这个任务可以通过使用半自动映射工具来减轻,该工具可以评估概念之间的语义相似性。这些映射系统目前大多依赖于语言匹配,在处理高度异构的数据源时效率不高。其中一些引用的是通用字典,没有考虑数据源领域的特殊性。为了更好地处理数据源的异构性,本文提出了INDIGO系统,它可以通过考虑数据源的上下文来计算语义匹配。INDIGO的独特特性是用从单个开发工件中提取的语义信息来丰富数据源。由于这个充实步骤,INDIGO可以计算出两个数据源之间更精确的映射,从而得到增强。INDIGO在本文提出的两个案例研究中进行了实验。INDIGO的性能还与该研究领域经常引用的三个匹配系统的结果进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Contextualized Linguistic Matching for Heterogeneous Data Source Integration
As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信