Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach
{"title":"网络词汇变化的多领域可解释预测","authors":"Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach","doi":"10.1145/3460210.3493583","DOIUrl":null,"url":null,"abstract":"Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.","PeriodicalId":377331,"journal":{"name":"Proceedings of the 11th on Knowledge Capture Conference","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Multi-domain and Explainable Prediction of Changes in Web Vocabularies\",\"authors\":\"Albert Meroño-Peñuela, Romana Pernisch, Christophe Guéret, S. Schlobach\",\"doi\":\"10.1145/3460210.3493583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.\",\"PeriodicalId\":377331,\"journal\":{\"name\":\"Proceedings of the 11th on Knowledge Capture Conference\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th on Knowledge Capture Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3460210.3493583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th on Knowledge Capture Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3460210.3493583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-domain and Explainable Prediction of Changes in Web Vocabularies
Web vocabularies (WV) have become a fundamental tool for structuring Web data: over 10 million sites use structured data formats and ontologies to markup content. Maintaining these vocabularies and keeping up with their changes are manual tasks with very limited automated support, impacting both publishers and users. Existing work shows that machine learning can be used to reliably predict vocabulary changes, but on specific domains (e.g. biomedicine) and with limited explanations on the impact of changes (e.g. their type, frequency, etc.). In this paper, we describe a framework that uses various supervised learning models to learn and predict changes in versioned vocabularies, independent of their domain. Using well-established results in ontology evolution we extract domain-agnostic and human-interpretable features and explain their influence on change predictability. Applying our method on 139 WV from 9 different domains, we find that ontology structural and instance data, the number of versions, and the release frequency highly correlate with predictability of change. These results can pave the way towards integrating predictive models into knowledge engineering practices and methods.