{"title":"Validating Ontology-based Annotations of Biomedical Resources using Zero-shot Learning","authors":"Dimitrios A. Koutsomitropoulos","doi":"10.1145/3486713.3486730","DOIUrl":null,"url":null,"abstract":"Authoritative thesauri in the form of web ontologies offer a sound representation of domain knowledge and can act as a reference point for automated semantic tagging. On the other hand, current language models achieve to capture contextualized semantics of text corpora and can be leveraged towards this goal. We present an approach for injecting subject annotations using query term expansion against such ontologies in the biomedical domain. For the user to have an indication of the usefulness of these suggestions we further propose an online method for validating the quality of annotations using NLI models such as BART and XLM-R. To circumvent training barriers posed by very large label sets and scarcity of data we rely on zero-shot classification and show that semantic matching can contribute above-average thematic annotations. Also, a web-based validation service can be attractive for human curators vs. the overhead of pretraining large, domain-tailored classification models.","PeriodicalId":268366,"journal":{"name":"The 12th International Conference on Computational Systems-Biology and Bioinformatics","volume":"543 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 12th International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3486713.3486730","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Authoritative thesauri in the form of web ontologies offer a sound representation of domain knowledge and can act as a reference point for automated semantic tagging. On the other hand, current language models achieve to capture contextualized semantics of text corpora and can be leveraged towards this goal. We present an approach for injecting subject annotations using query term expansion against such ontologies in the biomedical domain. For the user to have an indication of the usefulness of these suggestions we further propose an online method for validating the quality of annotations using NLI models such as BART and XLM-R. To circumvent training barriers posed by very large label sets and scarcity of data we rely on zero-shot classification and show that semantic matching can contribute above-average thematic annotations. Also, a web-based validation service can be attractive for human curators vs. the overhead of pretraining large, domain-tailored classification models.