OntoAugment

Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems Pub Date : 2021-11-15 DOI:10.1145/3485730.3493445

Fabio Maresca, Gürkan Solmaz, Flavio Cirillo

{"title":"OntoAugment","authors":"Fabio Maresca, Gürkan Solmaz, Flavio Cirillo","doi":"10.1145/3485730.3493445","DOIUrl":null,"url":null,"abstract":"Ontology matching enables harmonizing heterogeneous data models. Existing ontology matching approaches include machine learning. In particular, recent works leverage weak supervision (WS) through programmatic labeling to avoid the intensive hand-labeling for large ontologies. Programmatic labeling relies on heuristics and rules, called Labeling Functions (LFs), that generate noisy and incomplete labels. However, to cover a reasonable portion of the dataset, programmatic labeling might require a significant number of LFs that might be time expensive and not always straightforward to program. This paper proposes a novel system, namely OntoAugment, that augments LF labels for the ontology matching problem, starting from outcomes of the LFs. Our solution leverages the \"similarity of similarities\" between ontology concept bipairs that are two pairs of concepts. OntoAugment projects a label yielded by an LF for a concept pair to a similar pair that the same LF does not label. Thus, a wider portion of the dataset is covered even with a limited set of LFs. Experimentation results show that OntoAugment provides significant improvements (up to 11 F1 points) compared to the state-of-the-art WS approach when fewer LFs are used, whereas it maintains the performance without creating additional noise when a higher number of LFs already achieves high performance.","PeriodicalId":356322,"journal":{"name":"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"OntoAugment\",\"authors\":\"Fabio Maresca, Gürkan Solmaz, Flavio Cirillo\",\"doi\":\"10.1145/3485730.3493445\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ontology matching enables harmonizing heterogeneous data models. Existing ontology matching approaches include machine learning. In particular, recent works leverage weak supervision (WS) through programmatic labeling to avoid the intensive hand-labeling for large ontologies. Programmatic labeling relies on heuristics and rules, called Labeling Functions (LFs), that generate noisy and incomplete labels. However, to cover a reasonable portion of the dataset, programmatic labeling might require a significant number of LFs that might be time expensive and not always straightforward to program. This paper proposes a novel system, namely OntoAugment, that augments LF labels for the ontology matching problem, starting from outcomes of the LFs. Our solution leverages the \\\"similarity of similarities\\\" between ontology concept bipairs that are two pairs of concepts. OntoAugment projects a label yielded by an LF for a concept pair to a similar pair that the same LF does not label. Thus, a wider portion of the dataset is covered even with a limited set of LFs. Experimentation results show that OntoAugment provides significant improvements (up to 11 F1 points) compared to the state-of-the-art WS approach when fewer LFs are used, whereas it maintains the performance without creating additional noise when a higher number of LFs already achieves high performance.\",\"PeriodicalId\":356322,\"journal\":{\"name\":\"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3485730.3493445\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3485730.3493445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

OntoAugment

Ontology matching enables harmonizing heterogeneous data models. Existing ontology matching approaches include machine learning. In particular, recent works leverage weak supervision (WS) through programmatic labeling to avoid the intensive hand-labeling for large ontologies. Programmatic labeling relies on heuristics and rules, called Labeling Functions (LFs), that generate noisy and incomplete labels. However, to cover a reasonable portion of the dataset, programmatic labeling might require a significant number of LFs that might be time expensive and not always straightforward to program. This paper proposes a novel system, namely OntoAugment, that augments LF labels for the ontology matching problem, starting from outcomes of the LFs. Our solution leverages the "similarity of similarities" between ontology concept bipairs that are two pairs of concepts. OntoAugment projects a label yielded by an LF for a concept pair to a similar pair that the same LF does not label. Thus, a wider portion of the dataset is covered even with a limited set of LFs. Experimentation results show that OntoAugment provides significant improvements (up to 11 F1 points) compared to the state-of-the-art WS approach when fewer LFs are used, whereas it maintains the performance without creating additional noise when a higher number of LFs already achieves high performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems

自引率

0.00%

发文量