Biodiversity Information Science and Standards最新文献_第2页

NBN Atlas: Our transformation and re-alignment with the Living Atlas community NBN地图集:我们与生活地图集社区的转型和重新定位

Biodiversity Information Science and Standards Pub Date : 2023-09-18 DOI: 10.3897/biss.7.112813

Helen Manders-Jones, Keith Raven

{"title":"NBN Atlas: Our transformation and re-alignment with the Living Atlas community","authors":"Helen Manders-Jones, Keith Raven","doi":"10.3897/biss.7.112813","DOIUrl":"https://doi.org/10.3897/biss.7.112813","url":null,"abstract":"The National Biodiversity Network (NBN) Atlas is the largest repository of publicly available biodiversity data in the United Kingdom (UK). Built on the open-source Atlas of Living Australia (ALA) platform, it was launched in 2017 and is part of a global network of over 20 Living Atlases (live or in development). Notably, the NBN Atlas is the largest, with almost twice the number of records as the Atlas of Living Australia. In order to meet the needs of the UK biological recording community, the NBN Atlas was considerably customised. Regrettably, these customisations were directly applied to the platform code, resulting in divergence from the parent ALA platform and creating major obstacles to upgrading. To address these challenges, we initiated the Fit for the Future Project. We will outline our journey to decouple the customizations, realign with the ALA, upgrade the NBN Atlas, regain control of the infrastructure and modernize DevOps practices. Each of these steps played a crucial role in our overall transformation. Additionally, we will discuss a new project that will allow data providers to set the public resolution of all records in a dataset and give individuals and organisations access to the supplied location information. We will also highlight our efforts to leverage contributions from volunteer developers.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135203125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE) 人工智能加速昆虫馆藏数字化:新一代角度标签图像捕获设备(ALICE)

Biodiversity Information Science and Standards Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112742

Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith

{"title":"AI-Accelerated Digitisation of Insect Collections: The next generation of Angled Label Image Capture Equipment (ALICE)","authors":"Arianna Salili-James, Ben Scott, Laurence Livermore, Ben Price, Steen Dupont, Helen Hardy, Vincent Smith","doi":"10.3897/biss.7.112742","DOIUrl":"https://doi.org/10.3897/biss.7.112742","url":null,"abstract":"The digitisation of natural science specimens is a shared ambition of many of the largest collections, but the scale of these collections, estimated at at least 1.1 billion specimens (Johnson et al. 2023), continues to challenge even the most resource-rich organisations. The Natural History Museum, London (NHM) has been pioneering work to accelerate the digitisation of its 80 million specimens. Since the inception of the NHM Digital Collection Programme in 2014, more than 5.5 million specimen records have been made digitally accessible. This has enabled the museum to deliver a tenfold increase in digitisation, compared to when rates were first measured by the NHM in 2008. Even with this investment, it will take circa 150 years to digitise its remaining collections, leading the museum to pursue technology-led solutions alongside increased funding to deliver the next increase in digitisation rate. Insects comprise approximately half of all described species and, at the NHM, represent more than one-third (c. 30 million specimens) of the NHM’s overall collection. Their most common preservation method, attached to a pin alongside a series of labels with metadata, makes insect specimens challenging to digitise. Early Artificial Intelligence (AI)-led innovations (Price et al. 2018) resulted in the development of ALICE, the museum's Angled Label Image Capture Equipment, in which a pinned specimen is placed inside a multi-camera setup, which captures a series of partial views of a specimen and its labels. Centred around the pin, these images can be digitally combined and reconstructed, using the accompanying ALICE software, to provide a clean image of each label. To do this, a Convolutional Neural Network (CNN) model is incorporated, to locate all labels within the images. This is followed by various image processing tools to transform the labels into a two-dimensional viewpoint, align the associated label images together, and merge them into one label. This allows users to manually, or computationally (e.g., using Optical Character Recognition [OCR] tools) extract label data from the processed label images (Salili-James et al. 2022). With the ALICE setup, a user might average imaging 800 digitised specimens per day, and exceptionally, up to 1,300. This compares with an average of 250 specimens or fewer daily, using more traditional methods involving separating the labels and photographing them off of the pin. Despite this, our original version of ALICE was only suited to a small subset of the collection. In situations when the specimen is very large, there are too many labels, or these labels are too close together, ALICE fails (Dupont and Price 2019). Using a combination of updated AI processing tools, we hereby present ALICE version 2. This new version of ALICE provides faster rates, improved software accuracy, and a more streamlined pipeline. It includes the following updates: Hardware: after conducting various tests, we have opti","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping between Darwin Core and the Australian Biodiversity Information Standard: A linked data example 达尔文核心与澳大利亚生物多样性信息标准之间的映射:一个关联数据示例

Biodiversity Information Science and Standards Pub Date : 2023-09-15 DOI: 10.3897/biss.7.112722

Mieke Strong, Piers Higgs

引用次数: 0

Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature 协助动物命名决策与管理

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112710

Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler

{"title":"Lognom, Assisting in the Decision-Making and Management of Zoological Nomenclature","authors":"Elie Saliba, Régine Vignes Lebbe, Annemarie Ohler","doi":"10.3897/biss.7.112710","DOIUrl":"https://doi.org/10.3897/biss.7.112710","url":null,"abstract":"Nomenclature is the discipline of taxonomy responsible for managing the scientific names of groups of organisms. It ensures continuity in the transmission of all kinds of data and knowledge accumulated about taxa. Zoologists use the International Code of Zoological Nomenclature (International Commission on Zoological Nomenclature 1999), currently in its fourth edition. The Code contains the rules that allow the correct understanding and application of nomenclature, e.g., how to choose between two names applying to the same taxon. Nomenclature became more complex over the centuries, as rules appeared, disappeared, or evolved to adapt to scientific and technological changes (e.g., the inclusion of digital media) (International Commission on Zoological Nomenclature 2012). By adhering to nomenclatural rules, taxonomic databases, such as the Catalogue of Life (Bánki et al. 2023), can maintain the integrity and accuracy of taxon names, preventing confusion and ambiguity. Nomenclature also facilitates the linkage and integration of data across different databases, allowing for seamless collaboration and information exchange among researchers. However, unlike its final result, which is also called a nomenclature, the discipline itself has remained relatively impervious to computerization, until now. Lognom *1 is a free web application based on algorithms that facilitate decision-making in zoological nomenclature. It is not based on a pre-existing database, but instead provides an answer based on the user input, and relies on interactive form-based queries. This software aims to help taxonomists determine whether a name or work is available, whether spelling rules have been correctly applied, and whether all the relevant rules have been respected before a new name or work is published. Lognom also allows the user to obtain the valid name between several pre-registered candidate names, including the list of synonyms and the reason for their synonymy. It also includes tools for answering various nomenclatural questions, such as determining if two different species names with the same derivation and meaning should be treated as homonyms; if a name should be treated as a nomen oblitum under Art. 23.9 of the Code; and another tool to determine a genus-series name's grammatical gender. Lognom includes most of the rules regarding availability and validity, with the exception of those needing human interpretation, usually pertaining to Latin grammar. At this point of its development, homonymy is not completely included in the web app, nor are the rules linked to the management of type-specimens (e.g., lectotypification, neotypification), outside of their use in determining the availability of a name. With enough data entered by the users, Lognom should be able to model a modification of the rules and calculate its impact on the potential availability or spelling of existing names. Other prospectives include the possibility of working simultaneously on common proj","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP 利用生物多样性数据的多模态:利用CLIP探索物种描述和标本图像的联合表示

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112666

Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue

{"title":"Leveraging Multimodality for Biodiversity Data: Exploring joint representations of species descriptions and specimen images using CLIP","authors":"Maya Sahraoui, Youcef Sklab, Marc Pignal, Régine Vignes Lebbe, Vincent Guigue","doi":"10.3897/biss.7.112666","DOIUrl":"https://doi.org/10.3897/biss.7.112666","url":null,"abstract":"In recent years, the field of biodiversity data analysis has witnessed significant advancements, with a number of models emerging to process and extract valuable insights from various data sources. One notable area of progress lies in the analysis of species descriptions, where structured knowledge extraction techniques have gained prominence. These techniques aim to automatically extract relevant information from unstructured text, such as taxonomic classifications and morphological traits. (Sahraoui et al. 2022, Sahraoui et al. 2023) By applying natural language processing (NLP) and machine learning methods, structured knowledge extraction enables the conversion of textual species descriptions into a structured format, facilitating easier integration, searchability, and analysis of biodiversity data. Furthermore, object detection on specimen images has emerged as a powerful tool in biodiversity research. By leveraging computer vision algorithms (Triki et al. 2020, Triki et al. 2021,Ott et al. 2020), researchers can automatically identify and classify objects of interest within specimen images, such as organs, anatomical features, or specific taxa. Object detection techniques allow for the efficient and accurate extraction of valuable information, contributing to tasks like species identification, morphological trait analysis, and biodiversity monitoring. These advancements have been particularly significant in the context of herbarium collections and digitization efforts, where large volumes of specimen images need to be processed and analyzed. On the other hand, multimodal learning, an emerging field in artificial intelligence (AI), focuses on developing models that can effectively process and learn from multiple modalities, such as text and images (Li et al. 2020, Li et al. 2021, Li et al. 2019, Radford et al. 2021, Sun et al. 2021, Chen et al. 2022). By incorporating information from different modalities, multimodal learning aims to capture the rich and complementary characteristics present in diverse data sources. This approach enables the model to leverage the strengths of each modality, leading to enhanced understanding, improved performance, and more comprehensive representations. Structured knowledge extraction from species descriptions and object detection on specimen images synergistically enhances biodiversity data analysis. This integration leverages textual and visual data strengths, gaining deeper insights. Extracted structured information from descriptions improves search, classification, and correlation of biodiversity data. Object detection enriches textual descriptions, providing visual evidence for the verification and validation of species characteristics. To tackle the challenges posed by the massive volume of specimen images available at the Herbarium of the National Museum of Natural History in Paris, we have chosen to implement the CLIP (Contrastive Language-Image Pretraining) model (Radford et al. 2021) developed by Ope","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research? 深度学习方法在生物多样性研究中的可重复性如何?

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112698

Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel

{"title":"How Reproducible are the Results Gained with the Help of Deep Learning Methods in Biodiversity Research?","authors":"Waqas Ahmed, Vamsi Krishna Kommineni, Birgitta Koenig-ries, Sheeba Samuel","doi":"10.3897/biss.7.112698","DOIUrl":"https://doi.org/10.3897/biss.7.112698","url":null,"abstract":"In recent years, deep learning methods in the biodiversity domain have gained significant attention due to their ability to handle the complexity of biological data and to make processing of large volumes of data feasible. However, these methods are not easy to interpret, so the opacity of new scientific research and discoveries makes them somewhat untrustworthy. Reproducibility is a fundamental aspect of scientific research, which enables validation and advancement of methods and results. If results obtained with the help of deep learning methods were reproducible, this would increase their trustworthiness. In this study, we investigate the state of reproducibility of deep learning methods in biodiversity research. We propose a pipeline to investigate the reproducibility of deep learning methods in the biodiversity domain. In our preliminary work, we systematically mined the existing literature from Google Scholar to identify publications that employ deep-learning techniques for biodiversity research. By carefully curating a dataset of relevant publications, we extracted reproducibility-related variables for 61 publications using a manual approach, such as the availability of datasets and code that serve as fundamental criteria for reproducibility assessment. Moreover, we extended our analysis to include advanced reproducibility variables, such as the specific deep learning methods, models, hyperparameters, etc., employed in the studies. To facilitate the automatic extraction of information from publications, we plan to leverage the capabilities of large language models (LLMs). By using the latest natural language processing (NLP) techniques, we aim to identify and extract relevant information pertaining to the reproducibility of deep learning methods in the biodiversity domain. This study seeks to contribute to the establishment of robust and reliable research practices. The findings will not only aid in validating existing methods but also guide the development of future approaches, ultimately fostering transparency and trust in the application of deep learning techniques in biodiversity research.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture 一个简单的食谱烹饪你的人工智能辅助菜，以服务它在国际数字标本架构

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112678

Wouter Addink, Sam Leeflang, Sharif Islam

{"title":"A Simple Recipe for Cooking your AI-assisted Dish to Serve it in the International Digital Specimen Architecture","authors":"Wouter Addink, Sam Leeflang, Sharif Islam","doi":"10.3897/biss.7.112678","DOIUrl":"https://doi.org/10.3897/biss.7.112678","url":null,"abstract":"With the rise of Artificial Intelligence (AI), a large set of new tools and services is emerging that supports specimen data mapping, standards alignment, quality enhancement and enrichment of the data. These tools currently operate in isolation, targeted to individual collections, collection management systems and institutional datasets. To address this challenge, DiSSCo, the Distributed System of Scientific Collections, is developing a new infrastructure for digital specimens, transforming them into actionable information objects. This infrastructure incorporates a framework for annotation and curation that allows the objects to be enriched or enhanced by both experts and machines. This creates the unique possibility to plug-in AI-assisted services that can then leverage digital specimens through this infrastructure, which serves as a harmonised Findable, Accessible, Interoperable and Reusable (FAIR) abstraction layer on top of individual institutional systems or datasets. An early example of such services are the ones developed in the Specimen Data Refinery workflow (Hardisty et al. 2022). The new architecture, DS Arch or Digital Specimen Architecture, is built on the concept of FAIR Digital Objects (FDO) (Islam et al. 2020). All digital specimens and related objects are served with persistent identifiers and machine-readable FDO records with information for machines about the object together with a pointer to its machine-readable type description. The type describes the structure of the object, its attributes and describes allowed operations. The digital specimen type and specimen media type are based on existing Biodiversity Information Standards (TDWG) such as Darwin Core, Access to Biological Collection Data (ABCD) Schema and Audiovisual Core Multimedia Resources Metadata Schema, and include support for annotation operations based on the World Wide Web Consortium (W3C) Annotations Data Model. This enables AI-assisted services registered with DS Arch to autonomously discover digital specimen objects and determine the actions they are authorised to perform. AI-assisted services can facilitate various tasks such as digitisation, extract new information from specimen images, create relations with other objects or standardise data. These operations can be done autonomously, upon user request, or in tandem with expert validation. AI-assisted services registered with DS Arch, can interact in the same way with all digital specimens worldwide when served through DS Arch with their uniform FDO representation, even if the content richness, level of standardisation and scope of the specimen is different. DS Arch has been designed to serve digital specimens for living and preserved specimens, and preserved environmental, earth system and astrogeology samples. With the AI-assisted services, data can be annotated with new data, alternative values, corrections, and with new entity relationships. As a result, the digital specimens become Digital Extended S","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Combining Ecological and Socio-Environmental Data and Networks to Achieve Sustainability 结合生态和社会环境数据和网络实现可持续发展

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112703

Laure Berti-Equille, Rafael L. G. Raimundo

{"title":"Combining Ecological and Socio-Environmental Data and Networks to Achieve Sustainability","authors":"Laure Berti-Equille, Rafael L. G. Raimundo","doi":"10.3897/biss.7.112703","DOIUrl":"https://doi.org/10.3897/biss.7.112703","url":null,"abstract":"Environmental degradation in Brazil has been recently amplified by the expansion of agribusiness, livestock and mining activities with dramatic repercussions on ecosystem functions and services. The anthropogenic degradation of landscapes has substantial impacts on indigenous peoples and small organic farmers whose lifestyles are intimately linked to diverse and functional ecosystems. Understanding how we can apply science and technology to benefit from biodiversity and promote socio-ecological transitions ensuring equitable and sustainable use of common natural resources is a critical challenge brought on by the Anthropocene. We present our approach to combine biodiversity and environmental data, supported by two funded research projects: DATAPB (Data of Paraíba) to develop tools for FAIR (Findable, Accessible, Interoperable and Reusable) data sharing for governance and educational projects and the International Joint Laboratory IDEAL (artificial Intelligence, Data analytics, and Earth observation applied to sustAinability Lab) launched in 2023 by the French Institute for Sustainable Development (IRD, Institut de Recherche pour le Développement) and co-coordinated by the authors, with 50 researchers in 11 Brazilian and French institutions working on Artificial Intelligence and socio-ecological research in four Brazilian Northeast states: Paraíba, Rio Grande do Norte, Pernambuco, and Ceará (Berti-Equille and Raimundo 2023). As the keystone of these transdisciplinary projects, the concept-paradigm of socio-ecological coviability (Barrière et al. 2019) proposes that we should explore multiple ways by which relationships between humans and nonhumans (fauna, flora, natural resources) can reach functional and persistent states. Transdisciplinary approaches to agroecological transitions are urgently needed to address questions such as: How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? How can researchers, local communities, and policymakers co-produce participatory diagnoses that depict the coviability of a territory? How can we conserve biodiversity and ecosystem functions, promote social inclusion, value traditional knowledge, and strengthen bioeconomies at local and regional scales? How can biodiversity, social and environmental data, and networks help local communities in shaping adaptation pathways towards sustainable agroecological practices? These questions require transdisciplinary approaches and effective collaboration among environmental, social, and computer scientists, with the involvement of local stakeholders (Biggs et al. ","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging AI in Biodiversity Informatics: Ethics, privacy, and broader impacts 在生物多样性信息学中利用人工智能:伦理、隐私和更广泛的影响

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112701

Kristen "Kit" Lewers

{"title":"Leveraging AI in Biodiversity Informatics: Ethics, privacy, and broader impacts","authors":"Kristen \"Kit\" Lewers","doi":"10.3897/biss.7.112701","DOIUrl":"https://doi.org/10.3897/biss.7.112701","url":null,"abstract":"Artificial Intelligence (AI) has been heralded as a hero by some and rejected as a harbinger of destruction by others. While many in the community are excited about the functionality and promise AI brings to the field of biodiversity informatics, others have reservations regarding its widespread use. This talk will specifically address Large Language Models (LLMs) highlighting both the pros and cons of using LLMs. Like any tool, LLMs are neither good nor bad in and of themselves, but AI does need to be used within the appropriate scope of its ability and properly. Topics to be covered include model opacity (Franzoni 2023), privacy concerns (Wu et al. 2023), potential for algorithmic harm (Marjanovic et al. 2021) and model bias (Wang et al. 2020) in the context of generative AI along with how these topics differ from similar concerns when using traditional ML (Machine Learning) applications. Potential for implementation and training to ensure the most fair environment when leveraging AI and keeping FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles in mind, will also be discussed. The topics covered will be mainly framed through the Biodiversity Information Standards (TDWG) community, focusing on sociotechnical aspects and implications of implementing LLMs and generative AI. Finally, this talk will explore the potential applicability of TDWG standards pertaining to uniform prompting vocabulary when using generative AI and employing it as a tool for biodiversity informatics.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping across Standards to Calculate the MIDS Level of Digitisation of Natural Science Collections 跨标准映射计算自然科学馆藏数字化MIDS水平

Biodiversity Information Science and Standards Pub Date : 2023-09-14 DOI: 10.3897/biss.7.112672

Elspeth Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman

{"title":"Mapping across Standards to Calculate the MIDS Level of Digitisation of Natural Science Collections","authors":"Elspeth Haston, Mathias Dillen, Sam Leeflang, Wouter Addink, Claus Weiland, Dagmar Triebel, Eirik Rindal, Anke Penzlin, Rachel Walcott, Josh Humphries, Caitlin Chapman","doi":"10.3897/biss.7.112672","DOIUrl":"https://doi.org/10.3897/biss.7.112672","url":null,"abstract":"The Minimum Information about a Digital Specimen (MIDS) standard is being developed within Biodiversity Information Standards (TDWG) to provide a framework for organisations, communities and infrastructures to define, measure, monitor and prioritise the digitisation of specimen data to achieve increased accessibility and scientific use. MIDS levels indicate different levels of completeness in digitisation and range from Level 0: not yet meeting minimal required information needs for scientific use to Level 3: fulfilling the requirements for Digital Extended Specimens (Hardisty et al. 2022) by inclusion of persistent identifiers (PIDs) that connect the specimen with derived and related data. MIDS Levels 0–2 are generic for all specimens. From MIDS Level 2 onwards we make a distinction between biological, geological and palaeontological specimens. While MIDS represents a minimum specification, defining and publishing more extensive sets of information elements (extensions) is readily feasible and explicitly recommended. The MIDS level of a digital specimen can be calculated based on the availability of certain information elements. The MIDS standard applies to published data. The ability to map from, to and between TDWG standards is key to being able to measure the MIDS level of the digitised specimen(s). Each MIDS term is being mapped across TDWG standards involving Darwin Core (DwC), the Access to Biological Collections Data (ABCD) Schema and Latimer Core (LtC, Woodburn et al. 2022), using mapping properties provided by the Simple Knowledge Organization System (SKOS) ontology. In this presentation, we will show selected case studies that demonstrate the implementation of the MIDS standard supplemented by MIDS mappings to ABCD, to LtC, and to the Distributed System of Scientific Collections' (DISSCo) Open Digital Specimen specification. The studies show the mapping exercise in practice, with the aim of enabling fully automated and accurate calculations. To provide a reliable indicator for the level of digitisation completeness, it is important that calculations are done consistently in all implementations.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0