{"title":"Using lexicography to characterise relations between species mentions in the biodiversity literature","authors":"Sandra Young","doi":"10.1145/3322905.3322918","DOIUrl":null,"url":null,"abstract":"The biodiversity literature is one of the longest-standing examples of recording heritage in the world. Today there are many efforts to standardise and integrate the literature to ensure access to the information, both for heritage and research purposes. Ontologies are increasingly being turned to as knowledge representation tools in these efforts. However, the validity of using ontological frameworks to represent biological taxonomies has been questioned. Biological taxonomies use the scientific nomenclature to assign names to described species. While the nomenclature is a useful classification tool, it can also be a source of confusion because of its synonymous, homonymous and fluid nature. Despite this, no empirical evaluation of scientific nomenclature use in the literature has ever been performed. Corpus-based analysis is already used in automatic ontology extraction, and this study explores the possibility of applying recently developed lexicography techniques to the problem to provide an evaluation of the empirical data in the literature, and serve as a comparison with existing ontologies. This paper focuses on the work flow, parameters and preliminary findings of the research investigating how to extract structures from the literature to perform these comparisons. It uses the manipulation of corpus analysis techniques, visualisation and filtering methods to do so and evaluates potential classification and disambiguation qualities of the resulting graphs for future work. Preliminary results look at the effects of frequency and salience when filtering the graphs, which indicate that these filter parameters could be used for different purposes in revealing relationships between organism mentions.","PeriodicalId":418911,"journal":{"name":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3322905.3322918","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The biodiversity literature is one of the longest-standing examples of recording heritage in the world. Today there are many efforts to standardise and integrate the literature to ensure access to the information, both for heritage and research purposes. Ontologies are increasingly being turned to as knowledge representation tools in these efforts. However, the validity of using ontological frameworks to represent biological taxonomies has been questioned. Biological taxonomies use the scientific nomenclature to assign names to described species. While the nomenclature is a useful classification tool, it can also be a source of confusion because of its synonymous, homonymous and fluid nature. Despite this, no empirical evaluation of scientific nomenclature use in the literature has ever been performed. Corpus-based analysis is already used in automatic ontology extraction, and this study explores the possibility of applying recently developed lexicography techniques to the problem to provide an evaluation of the empirical data in the literature, and serve as a comparison with existing ontologies. This paper focuses on the work flow, parameters and preliminary findings of the research investigating how to extract structures from the literature to perform these comparisons. It uses the manipulation of corpus analysis techniques, visualisation and filtering methods to do so and evaluates potential classification and disambiguation qualities of the resulting graphs for future work. Preliminary results look at the effects of frequency and salience when filtering the graphs, which indicate that these filter parameters could be used for different purposes in revealing relationships between organism mentions.