网络词汇变化的多领域可解释预测

Proceedings of the 11th on Knowledge Capture Conference Pub Date : 2021-12-02 DOI:10.1145/3460210.3493583

Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach

{"title":"网络词汇变化的多领域可解释预测","authors":"Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach","doi":"10.1145/3460210.3493583","DOIUrl":null,"url":null,"abstract":"Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multi-domain and Explainable Prediction of Changes in Web Vocabularies\",\"authors\":\"Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach\",\"doi\":\"10.1145/3460210.3493583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.\",\"PeriodicalId\":377331,\"journal\":{\"name\":\"Proceedings of the 11th on Knowledge Capture Conference\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th on Knowledge Capture Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460210.3493583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th on Knowledge Capture Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460210.3493583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Web词汇表(WV)已经成为构建Web数据的基本工具:超过1000万个站点使用结构化数据格式和本体来标记内容。维护这些词汇表并跟上它们的变化是手工任务，自动化支持非常有限，对发布者和用户都有影响。现有的工作表明，机器学习可以用来可靠地预测词汇的变化，但在特定的领域(例如生物医学)，并且对变化的影响(例如其类型，频率等)的解释有限。在本文中，我们描述了一个框架，该框架使用各种监督学习模型来学习和预测版本化词汇表的变化，而不依赖于它们的领域。利用本体进化的成熟结果，我们提取领域不可知论和人类可解释的特征，并解释它们对变化可预测性的影响。将该方法应用于9个不同领域的139个WV，我们发现本体结构和实例数据、版本数量和发布频率与变化的可预测性高度相关。这些结果可以为将预测模型集成到知识工程实践和方法中铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-domain and Explainable Prediction of Changes in Web Vocabularies

Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th on Knowledge Capture Conference

自引率

0.00%

发文量