{"title":"异构数据源集成的语境化语言匹配","authors":"Y. B. Idrissi, J. Vachon","doi":"10.1109/MCETECH.2008.33","DOIUrl":null,"url":null,"abstract":"As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.","PeriodicalId":299458,"journal":{"name":"2008 International MCETECH Conference on e-Technologies (mcetech 2008)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Contextualized Linguistic Matching for Heterogeneous Data Source Integration\",\"authors\":\"Y. B. Idrissi, J. Vachon\",\"doi\":\"10.1109/MCETECH.2008.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.\",\"PeriodicalId\":299458,\"journal\":{\"name\":\"2008 International MCETECH Conference on e-Technologies (mcetech 2008)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 International MCETECH Conference on e-Technologies (mcetech 2008)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCETECH.2008.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International MCETECH Conference on e-Technologies (mcetech 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCETECH.2008.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contextualized Linguistic Matching for Heterogeneous Data Source Integration
As one can expect, members of a common market are likely to work in quite different domains and use different kinds of schemas to describe their own business data. An ideal world would allow such heterogeneous data to be integrated into some (concrete or virtual) business data repository. It seems illusory to think that all members joining a common marketplace can directly provide data adhering to some predefined federative data scheme (e.g. standard ontology). More flexibility is usually required. Realistically, the integration of a new data source requires a mapping step allowing to compute semantic equivalences between the various schema concepts to be merged. This task can be alleviated by the use of semi-automatic mapping tools which evaluate semantic similarities between concepts. Most of these mapping systems currently rely on linguistic matching and are not so efficient when dealing with highly heterogeneous data sources. Some of them refer to general purpose dictionaries not taking into account the specificity of data sources' domain. To better cope with data source heterogeneity, this article presents INDIGO, a system which can compute semantic matching by taking into account data sources' context. The distinctive feature of INDIGO is to enrich data sources with semantic information extracted from their individual development artifacts. Thanks to this enrichment step, INDIGO can then compute a more accurate mapping between the two data sources thus enhanced. INDIGO was experimented on two case studies presented in this paper. INDIGO's performances are also compared to the results of three matching systems often cited in this research domain.