Fahad Ahmed Satti, Musarrat Hussain, Sungyoung Lee, T. Chung
{"title":"Significance of Syntactic Type Identification in Embedding Vector based Schema Matching","authors":"Fahad Ahmed Satti, Musarrat Hussain, Sungyoung Lee, T. Chung","doi":"10.1109/imcom53663.2022.9721780","DOIUrl":null,"url":null,"abstract":"Data Interoperability provides a bridge between information systems to store, exchange and consume heterogeneous data. In order to achieve this goal, schema maps provide the necessary foundations. Traditional solutions rely on expert generated rules, ontologies, and syntactic matching to identify the similarity between attributes in the various data schema. While previously we have presented the effectiveness of transformer based models and unsupervised learning to calculate attribute similarities, in this paper we present the additional application of a naive syntactic similarity measurement We have evaluated the results of agreement between the computed and human annotated results, in terms of Mathews Correlation Coefficient (MCC). These results indicate that on weighted comparison there is no positive effect of including naive syntactic similarity in addition to semantic similarity.","PeriodicalId":367038,"journal":{"name":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 16th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/imcom53663.2022.9721780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data Interoperability provides a bridge between information systems to store, exchange and consume heterogeneous data. In order to achieve this goal, schema maps provide the necessary foundations. Traditional solutions rely on expert generated rules, ontologies, and syntactic matching to identify the similarity between attributes in the various data schema. While previously we have presented the effectiveness of transformer based models and unsupervised learning to calculate attribute similarities, in this paper we present the additional application of a naive syntactic similarity measurement We have evaluated the results of agreement between the computed and human annotated results, in terms of Mathews Correlation Coefficient (MCC). These results indicate that on weighted comparison there is no positive effect of including naive syntactic similarity in addition to semantic similarity.