Biodiversity Information Science and Standards最新文献

筛选
英文 中文
Filling Gaps in Earthworm Digital Diversity in Northern Eurasia from Russian-language Literature 从俄语文献填补欧亚大陆北部蚯蚓数字多样性的空白
Biodiversity Information Science and Standards Pub Date : 2023-09-20 DOI: 10.3897/biss.7.112957
Maxim Shashkov, Natalya Ivanova, Sergey Ermolov
{"title":"Filling Gaps in Earthworm Digital Diversity in Northern Eurasia from Russian-language Literature","authors":"Maxim Shashkov, Natalya Ivanova, Sergey Ermolov","doi":"10.3897/biss.7.112957","DOIUrl":"https://doi.org/10.3897/biss.7.112957","url":null,"abstract":"Data availability for certain groups of organisms (ecosystem engineers, invasive or protected species, etc.) is important for monitoring and making predictions in changing environments. One of the most promising directions for research on the impact of changes is species distribution modelling. Such technologies are highly dependent on occurrence data of high quality (Van Eupen et al. 2021). Earthworms (order Crassiclitellata) are a key group of organisms (Lavelle 2014), but their distribution around the globe is underrepresented in digital resources. Dozens of earthworm species, both widespread and endemic, inhabit the territory of Northern Eurasia (Perel 1979), but extremely poor data on them is available through global biodiversity repositories (Cameron 2018). There are two main obstacles to data mobilisation. Firstly, studies of the diversity of earthworms in Northen Eurasia have a long history (since the end of the nineteenth century) and were conducted by several generations of Soviet and Russian researchers. Most of the collected data have been published in \"grey literature\", now stored only in a few libraries. Until recently, most of these remained largely undigitised, and some are probably irretrievably lost. The second problem is the difference in the taxonomic checklists used by Soviet and European researchers. Not all species and synonyms are included in the GBIF (Global Biodiversity Information Facility) Backbone Taxonomy. As a result, existing earthworm species distribution models (Phillips 2019) potentially miss a significant amount of data and may underestimate biodiversity, and predict distributions inaccurately. To fill this gap, we collected occurrence data from the Russian language literature (published by Soviet and Russian researchers) and digitised species checklists, keeping the original scientific names. To find relevant literature, we conducted a keyword search for \"earthworms\" and \"Lumbricidae\" through the Russian national scientific online library eLibrary and screened reference lists from the monographs of leading Soviet and Russian soil zoologist Tamara Perel (Vsevolodova-Perel 1997, Perel 1979). As a result, about 1,000 references were collected, of which 330 papers had titles indicating the potential to contain data on earthworm occurrences. Among these, 219 were found as PDF files or printed papers. For dataset compilation, 159 papers were used; the others had no exact location data or duplicated data contained in other papers. Most of the sources were peer-reviewed articles (Table 1). A reference list is available through Zenodo (Ivanova et al. 2023). The earliest publication we could find dates back to 1899, by Wilhelm Michaelsen. The most recent publication is 2023. About a third of the sources were written by systematists Iosif Malevich and Tamara Perel. Occurrence data were extracted and structured according to the Darwin Core standard (Wieczorek et al. 2012). During the data digitisation process, we tried to","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"187 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robot-in-the-loop: Prototyping robotic digitisation at the Natural History Museum 机器人在循环:原型机器人数字化在自然历史博物馆
Biodiversity Information Science and Standards Pub Date : 2023-09-20 DOI: 10.3897/biss.7.112947
Ben Scott, Arianna Salili-James, Vincent Smith
{"title":"Robot-in-the-loop: Prototyping robotic digitisation at the Natural History Museum","authors":"Ben Scott, Arianna Salili-James, Vincent Smith","doi":"10.3897/biss.7.112947","DOIUrl":"https://doi.org/10.3897/biss.7.112947","url":null,"abstract":"The Natural History Museum, London (NHM) is home to an impressive collection of over 80 million specimens, of which just 5.5 million have been digitised. Like all similar collections, digitisation of these specimens is very labour intensive, requiring time-consuming manual handling. Each specimen is extracted from its curatorial unit, placed for imaging, labels are manually manipulated, and then returned to storage. Thanks to the NHM’s team of digitisers, workflows are becoming more efficient as they are refined. However, many of these workflows are highly repetitive and ideally suited to automation. The museum is now exploring integrating robots into the digitisation process. The NHM has purchased a Techman TM5 900 robotic arm, equipped with integrated Artificial Intelligence (AI) software and additional features such as custom grippers and a 3D scanner. This robotic arm combines advanced imaging technologies, machine learning algorithms, and robotic manipulation capabilities to capture high-quality specimen data, making it possible to digitise vast collections efficiently (Fig. 1). We showcase the NHM's application of robotics for digitisation, outlining the use cases developed for implementation and the prototypical workflows already in place at the museum. We will explore our invasive and non-invasive digitisation experiments, the many challenges, and the initial results of our early experiments with this transformative technology.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136308760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What Can You Do With 200 Million Newspaper Articles: Exploring GLAM data in the Humanities 如何处理2亿篇报纸文章:探索人文学科的GLAM数据
Biodiversity Information Science and Standards Pub Date : 2023-09-19 DOI: 10.3897/biss.7.112935
Tim Sherratt
{"title":"What Can You Do With 200 Million Newspaper Articles: Exploring GLAM data in the Humanities","authors":"Tim Sherratt","doi":"10.3897/biss.7.112935","DOIUrl":"https://doi.org/10.3897/biss.7.112935","url":null,"abstract":"I’m a historian who works with data from the GLAM sector (galleries, libraries, archives and museums). When I talk about GLAM data, I’m usually talking about things like newspapers, government documents, photographs, letters, websites, and books. Some of it is well-described, structured, and easily accessible, and some is not. All of it offers us the chance to ask new questions of our past, to see things differently. But what tools, what examples, what documentation, and what support are needed to encourage researchers to explore these possibilities—to engage with collections as data? In this talk, I’ll be describing some of my own adventures amidst GLAM data, before focusing on questions of access, infrastructure, and skills development. In particular, I’ll be introducing the GLAM Workbench—a collection of tools, tutorials, examples, and hacks aimed at helping humanities researchers navigate the world of data. What pathways do we need, and how can we build them?","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135061374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using ChatGPT with Confidence for Biodiversity-Related Information Tasks 在生物多样性相关信息任务中自信地使用ChatGPT
Biodiversity Information Science and Standards Pub Date : 2023-09-19 DOI: 10.3897/biss.7.112926
Michael Elliott, José Fortes
{"title":"Using ChatGPT with Confidence for Biodiversity-Related Information Tasks","authors":"Michael Elliott, José Fortes","doi":"10.3897/biss.7.112926","DOIUrl":"https://doi.org/10.3897/biss.7.112926","url":null,"abstract":"Recent advancements in conversational Artificial Intelligence (AI), such as OpenAI's Chat Generative Pre-Trained Transformer (ChatGPT), present the possibility of using large language models (LLMs) as tools for retrieving, analyzing, and transforming scientific information. We have found that ChatGPT (GPT 3.5) can provide accurate biodiversity knowledge in response to questions about species descriptions, occurrences, and taxonomy, as well as structure information according to data sharing standards such as Darwin Core. A rigorous evaluation of ChatGPT's capabilities in biodiversity-related tasks may help to inform viable use cases for today's LLMs in research and information workflows. In this work, we test the extent of ChatGPT's biodiversity knowledge, characterize its mistakes, and suggest how LLM-based systems might be designed to complete knowledge-based tasks with confidence. To test ChatGPT's biodiversity knowledge, we compiled a question-and-answer test set derived from Darwin Core records available in Integrated Digitized Biocollections (iDigBio). Each question focuses on one or more Darwin Core terms to test the model’s ability to recall species occurrence information and its understanding of the standard. The test set covers a range of locations, taxonomic groups, and both common and rare species (defined by the number of records in iDigBio). The results of the tests will be presented. We also tested ChatGPT on generative tasks, such as creating species occurrence maps. A visual comparison of the maps with iDigBio data shows that for some species, ChatGPT can generate fairly accurate representationsof their geographic ranges (Fig. 1). ChatGPT's incorrect responses in our tests show several patterns of mistakes. First, responses can be self-conflicting. For example, when asked \"Does Acer saccharum naturally occur in Benton, Oregon?\", ChatGPT responded \"YES, Acer saccharum DOES NOT naturally occur in Benton, Oregon\". ChatGPT can also be misled by semantics in species names. For Rafinesquia neomexicana , the word \"neomexicana\" leads ChatGPT to believe that the species primarily occurs in New Mexico, USA. ChatGPT may also confuse species, such as when attempting to describe a lesser-known species (e.g., a rare bee) within the same genus as a better-known species. Other causes of mistakes include hallucination (Ji et al. 2023), memorization (Chang and Bergen 2023), and user deception (Li et al. 2023). Some mistakes may be avoided by prompt engineering, e.g., few-shot prompting (Chang and Bergen 2023) and chain-of-thought prompting (Wei et al. 2022). These techniques assist Large Language Models (LLMs) by clarifying expectations or by guiding recollection. However, such methods cannot help when LLMs lack required knowledge. In these cases, alternative approaches are needed. A desired reliability can be theoretically guaranteed if responses that contain mistakes are discarded or corrected. This requires either detecting or predicting mistake","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135061369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NBN Atlas: Our transformation and re-alignment with the Living Atlas community NBN地图集:我们与生活地图集社区的转型和重新定位
Biodiversity Information Science and Standards Pub Date : 2023-09-18 DOI: 10.3897/biss.7.112813
Helen Manders-Jones, Keith Raven
{"title":"NBN Atlas: Our transformation and re-alignment with the Living Atlas community","authors":"Helen Manders-Jones, Keith Raven","doi":"10.3897/biss.7.112813","DOIUrl":"https://doi.org/10.3897/biss.7.112813","url":null,"abstract":"The National Biodiversity Network (NBN) Atlas is the largest repository of publicly available biodiversity data in the United Kingdom (UK). Built on the open-source Atlas of Living Australia (ALA) platform, it was launched in 2017 and is part of a global network of over 20 Living Atlases (live or in development). Notably, the NBN Atlas is the largest, with almost twice the number of records as the Atlas of Living Australia. In order to meet the needs of the UK biological recording community, the NBN Atlas was considerably customised. Regrettably, these customisations were directly applied to the platform code, resulting in divergence from the parent ALA platform and creating major obstacles to upgrading. To address these challenges, we initiated the Fit for the Future Project. We will outline our journey to decouple the customizations, realign with the ALA, upgrade the NBN Atlas, regain control of the infrastructure and modernize DevOps practices. Each of these steps played a crucial role in our overall transformation. Additionally, we will discuss a new project that will allow data providers to set the public resolution of all records in a dataset and give individuals and organisations access to the supplied location information. We will also highlight our efforts to leverage contributions from volunteer developers.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135203125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE) 人工智能加速昆虫馆藏数字化:新一代角度标签图像捕获设备(ALICE)
Biodiversity Information Science and Standards Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112742
Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith
{"title":"AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE)","authors":"Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith","doi":"10.3897/biss.7.112742","DOIUrl":"https://doi.org/10.3897/biss.7.112742","url":null,"abstract":"The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (Johnson et al. 2023), continues to challenge even the most resource-rich organisations. The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate. Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (Price et al. 2018) resulted in the development of ALICE, the museum's Angled Label Image Capture Equipment, in which a pinned specimen is placed inside a multi-camera setup, which captures a series of partial views of a specimen and its labels. Centred around the pin, these images can be digitally combined and reconstructed, using the accompanying ALICE software, to provide a clean image of each label. To do this, a Convolutional Neural Network (CNN) model is incorporated, to locate all labels within the images. This is followed by various image processing tools to transform the labels into a two-dimensional viewpoint, align the associated label images together, and merge them into one label. This allows users to manually, or computationally (e.g., using Optical Character Recognition [OCR] tools) extract label data from the processed label images (Salili-James et al. 2022). With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (Dupont and Price 2019). Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates: Hardware: after conducting various tests, we have opti","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping between Darwin Core and the Australian Biodiversity Information Standard: A linked data example 达尔文核心与澳大利亚生物多样性信息标准之间的映射:一个关联数据示例
Biodiversity Information Science and Standards Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112722
Mieke Strong, Piers Higgs
{"title":"Mapping between Darwin Core and the Australian Biodiversity Information Standard: A linked data example","authors":"Mieke Strong, Piers Higgs","doi":"10.3897/biss.7.112722","DOIUrl":"https://doi.org/10.3897/biss.7.112722","url":null,"abstract":"The Australian Biodiversity Information Standard (ABIS) is a data standard that has been developed to represent and exchange biodiversity data expressed using the Resource Description Framework (RDF). ABIS has the TERN ontology at its core, which is a conceptual information model that represents plot-based ecological surveys. The RDF-linked data structure is self-describing, composed of “triples”. This format is quite different from tabular data. During the Australian federal government Biodiversity Data Repository pilot project, occurrence data in tabular Darwin Core format was converted into ABIS linked data. This lightning talk will describe the approach taken, the challenges that arose, and the ways in which data using Darwin Core terms can be represented in a different way using linked data technologies.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature 协助动物命名决策与管理
Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112710
Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler
{"title":"Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature","authors":"Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler","doi":"10.3897/biss.7.112710","DOIUrl":"https://doi.org/10.3897/biss.7.112710","url":null,"abstract":"Nomenclature is the discipline of taxonomy responsible for managing the scientific names of groups of organisms. It ensures continuity in the transmission of all kinds of data and knowledge accumulated about taxa. Zoologists use the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature 1999), currently in its fourth edition. The Code contains the rules that allow the correct understanding and application of nomenclature, e.g., how to choose between two names applying to the same taxon. Nomenclature became more complex over the centuries, as rules appeared, disappeared, or evolved to adapt to scientific and technological changes (e.g., the inclusion of digital media) (International Commission on Zoological Nomenclature 2012). By adhering to nomenclatural rules, taxonomic databases, such as the Catalogue of Life (Bánki et al. 2023), can maintain the integrity and accuracy of taxon names, preventing confusion and ambiguity. Nomenclature also facilitates the linkage and integration of data across different databases, allowing for seamless collaboration and information exchange among researchers. However, unlike its final result, which is also called a nomenclature, the discipline itself has remained relatively impervious to computerization, until now. Lognom *1 is a free web application based on algorithms that facilitate decision-making in zoological nomenclature. It is not based on a pre-existing database, but instead provides an answer based on the user input, and relies on interactive form-based queries. This software aims to help taxonomists determine whether a name or work is available, whether spelling rules have been correctly applied, and whether all the relevant rules have been respected before a new name or work is published. Lognom also allows the user to obtain the valid name between several pre-registered candidate names, including the list of synonyms and the reason for their synonymy. It also includes tools for answering various nomenclatural questions, such as determining if two different species names with the same derivation and meaning should be treated as homonyms; if a name should be treated as a nomen oblitum under Art. 23.9 of the Code; and another tool to determine a genus-series name's grammatical gender. Lognom includes most of the rules regarding availability and validity, with the exception of those needing human interpretation, usually pertaining to Latin grammar. At this point of its development, homonymy is not completely included in the web app, nor are the rules linked to the management of type-specimens (e.g., lectotypification, neotypification), outside of their use in determining the availability of a name. With enough data entered by the users, Lognom should be able to model a modification of the rules and calculate its impact on the potential availability or spelling of existing names. Other prospectives include the possibility of working simultaneously on common proj","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP 利用生物多样性数据的多模态:利用CLIP探索物种描述和标本图像的联合表示
Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112666
Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue
{"title":"Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP","authors":"Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue","doi":"10.3897/biss.7.112666","DOIUrl":"https://doi.org/10.3897/biss.7.112666","url":null,"abstract":"In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (Sahraoui et al. 2022, Sahraoui et al. 2023) By applying natural language processing (NLP) and machine learning methods, structured knowledge extraction enables the conversion of textual species descriptions into a structured format, facilitating easier integration, searchability, and analysis of biodiversity data. Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (Triki et al. 2020, Triki et al. 2021,Ott et al. 2020), researchers can automatically identify and classify objects of interest within specimen images, such as organs, anatomical features, or specific taxa. Object detection techniques allow for the efficient and accurate extraction of valuable information, contributing to tasks like species identification, morphological trait analysis, and biodiversity monitoring. These advancements have been particularly significant in the context of herbarium collections and digitization efforts, where large volumes of specimen images need to be processed and analyzed. On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022). By incorporating information from different modalities, multimodal learning aims to capture the rich and complementary characteristics present in diverse data sources. This approach enables the model to leverage the strengths of each modality, leading to enhanced understanding, improved performance, and more comprehensive representations. Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics. To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (Radford et al. 2021) developed by Ope","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research? 深度学习方法在生物多样性研究中的可重复性如何?
Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112698
Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel
{"title":"How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research?","authors":"Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel","doi":"10.3897/biss.7.112698","DOIUrl":"https://doi.org/10.3897/biss.7.112698","url":null,"abstract":"In recent years, deep learning methods in the biodiversity domain have gained significant attention due to their ability to handle the complexity of biological data and to make processing of large volumes of data feasible. However, these methods are not easy to interpret, so the opacity of new scientific research and discoveries makes them somewhat untrustworthy. Reproducibility is a fundamental aspect of scientific research, which enables validation and advancement of methods and results. If results obtained with the help of deep learning methods were reproducible, this would increase their trustworthiness. In this study, we investigate the state of reproducibility of deep learning methods in biodiversity research. We propose a pipeline to investigate the reproducibility of deep learning methods in the biodiversity domain. In our preliminary work, we systematically mined the existing literature from Google Scholar to identify publications that employ deep-learning techniques for biodiversity research. By carefully curating a dataset of relevant publications, we extracted reproducibility-related variables for 61 publications using a manual approach, such as the availability of datasets and code that serve as fundamental criteria for reproducibility assessment. Moreover, we extended our analysis to include advanced reproducibility variables, such as the specific deep learning methods, models, hyperparameters, etc., employed in the studies. To facilitate the automatic extraction of information from publications, we plan to leverage the capabilities of large language models (LLMs). By using the latest natural language processing (NLP) techniques, we aim to identify and extract relevant information pertaining to the reproducibility of deep learning methods in the biodiversity domain. This study seeks to contribute to the establishment of robust and reliable research practices. The findings will not only aid in validating existing methods but also guide the development of future approaches, ultimately fostering transparency and trust in the application of deep learning techniques in biodiversity research.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书