{"title":"OntoAugment","authors":"Fabio Maresca, Gürkan Solmaz, Flavio Cirillo","doi":"10.1145/3485730.3493445","DOIUrl":null,"url":null,"abstract":"Ontology matching enables harmonizing heterogeneous data models. Existing ontology matching approaches include machine learning. In particular, recent works leverage weak supervision (WS) through programmatic labeling to avoid the intensive hand-labeling for large ontologies. Programmatic labeling relies on heuristics and rules, called Labeling Functions (LFs), that generate noisy and incomplete labels. However, to cover a reasonable portion of the dataset, programmatic labeling might require a significant number of LFs that might be time expensive and not always straightforward to program. This paper proposes a novel system, namely OntoAugment, that augments LF labels for the ontology matching problem, starting from outcomes of the LFs. Our solution leverages the \"similarity of similarities\" between ontology concept bipairs that are two pairs of concepts. OntoAugment projects a label yielded by an LF for a concept pair to a similar pair that the same LF does not label. Thus, a wider portion of the dataset is covered even with a limited set of LFs. Experimentation results show that OntoAugment provides significant improvements (up to 11 F1 points) compared to the state-of-the-art WS approach when fewer LFs are used, whereas it maintains the performance without creating additional noise when a higher number of LFs already achieves high performance.","PeriodicalId":356322,"journal":{"name":"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485730.3493445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Ontology matching enables harmonizing heterogeneous data models. Existing ontology matching approaches include machine learning. In particular, recent works leverage weak supervision (WS) through programmatic labeling to avoid the intensive hand-labeling for large ontologies. Programmatic labeling relies on heuristics and rules, called Labeling Functions (LFs), that generate noisy and incomplete labels. However, to cover a reasonable portion of the dataset, programmatic labeling might require a significant number of LFs that might be time expensive and not always straightforward to program. This paper proposes a novel system, namely OntoAugment, that augments LF labels for the ontology matching problem, starting from outcomes of the LFs. Our solution leverages the "similarity of similarities" between ontology concept bipairs that are two pairs of concepts. OntoAugment projects a label yielded by an LF for a concept pair to a similar pair that the same LF does not label. Thus, a wider portion of the dataset is covered even with a limited set of LFs. Experimentation results show that OntoAugment provides significant improvements (up to 11 F1 points) compared to the state-of-the-art WS approach when fewer LFs are used, whereas it maintains the performance without creating additional noise when a higher number of LFs already achieves high performance.