Biodiversity Information Science and Standards最新文献_第4页

The Role of the CLIP Model in Analysing Herbarium Specimen Images CLIP模型在分析植物标本图像中的作用

Biodiversity Information Science and Standards Pub Date : 2023-09-12 DOI: 10.3897/biss.7.112566

Vamsi Krishna Kommineni, Jens Kattge, Jitendra Gaikwad, Susanne Tautenhahn, Birgitta Koenig-ries

{"title":"The Role of the CLIP Model in Analysing Herbarium Specimen Images","authors":"Vamsi Krishna Kommineni, Jens Kattge, Jitendra Gaikwad, Susanne Tautenhahn, Birgitta Koenig-ries","doi":"10.3897/biss.7.112566","DOIUrl":"https://doi.org/10.3897/biss.7.112566","url":null,"abstract":"The number of openly-accessible digital plant specimen images is growing tremendously and available through data aggregators: Global Biodiversity Information Facility (GBIF) contains 43.2 million images, and Intergrated Digitized Biocollections (iDigBio) contains 32.4 million images (Accessed on 29.06.2023). All these images contain great ecological (morphological, phenological, taxonomic etc.) information, which has the potential to facilitate the conduct of large-scale analyses. However, extracting this information from these images and making it available to analysis tools remains challenging and requires more advanced computer vision algorithms. With the latest advancements in the natural language processing field, it is becoming possible to analyse images with text prompts. For example, with the Contrastive Language-Image Pre-Training (CLIP) model, which was trained on 400 million image-text pairs, it is feasible to classify day-to-day life images by providing different text prompts and an image as an input to the model, then the model can predict the most suitable text prompt for the input image. We explored the feasibility of using the CLIP model to analyse digital plant specimen images. A particular focus of this study was on the generation of appropriate text prompts. This is important as the prompt has a large influence on the results of the model. We experimented with three different methods: a) automatic text prompt based on metadata of the specific image or other datasets, b) automatic generic text prompt of the image (describing what is in the image) and c) manual text prompt by annotating the image. We investigated the suitability of these prompts with an experiment, where we tested whether the CLIP model could recognize a herbarium specimen image using digital plant specimen images and semantically disparate text prompts. Our ultimate goal is to filter the digital plant specimen images based on the availability of intact leaves and measurement scale to reduce the number of specimens that reach the downstream pipeline, for instance, the segmentation task for the leaf trait extraction process. To achieve the goal, we are fine-tuning the CLIP model with a dataset of around 20,000 digital plant specimen image-text prompt pairs, where the text prompts were generated using different datasets, metadata and generic text prompt methods. Since the text prompts can be created automatically, it is possible to eradicate the laborious manual annotating process. In conclusion, we present our experimental testing of the CLIP model on digital plant specimen images with varied settings and how the CLIP model can act as a potential filtering tool. In future, we plan to investigate the possibility of using text prompts to do the instance segmentation to extract leaf trait information using Large Language Models (LLMs).","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135879248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Community Curation of Nomenclatural and Taxonomic Information in the Context of the Collection Management System JACQ 集合管理系统JACQ环境下的命名与分类信息的社区管理

Biodiversity Information Science and Standards Pub Date : 2023-09-12 DOI: 10.3897/biss.7.112571

Heimo Rainer, Andreas Berger, Tanja Schuster, Johannes Walter, Dieter Reich, Kurt Zernig, Jiří Danihelka, Hana Galušková, Patrik Mráz, Natalia Tkach, Jörn Hentschel, Jochen Müller, Sarah Wagner, Walter Berendsohn, Robert Lücking, Robert Vogt, Lia Pignotti, Francesco Roma-Marzio, Lorenzo Peruzzi

{"title":"Community Curation of Nomenclatural and Taxonomic Information in the Context of the Collection Management System JACQ","authors":"Heimo Rainer, Andreas Berger, Tanja Schuster, Johannes Walter, Dieter Reich, Kurt Zernig, Jiří Danihelka, Hana Galušková, Patrik Mráz, Natalia Tkach, Jörn Hentschel, Jochen Müller, Sarah Wagner, Walter Berendsohn, Robert Lücking, Robert Vogt, Lia Pignotti, Francesco Roma-Marzio, Lorenzo Peruzzi","doi":"10.3897/biss.7.112571","DOIUrl":"https://doi.org/10.3897/biss.7.112571","url":null,"abstract":"Nomenclatural and taxonomic information are crucial for curating botanical collections. In the course of changing methods for systematic and taxonomic studies, classification systems changed considerably over time (Dalla Torre and Harms 1900, Durand and Bentham 1888, Endlicher 1836, Angiosperm Phylogeny Group et al. 2016). Various approaches to store preserved material have been implemented, most of them based on scientific names (e.g., families, genera, species) often in combination with other criteria such as geographic provenance or collectors. The collection management system, JACQ, was established in the early 2000s then developed to support multiple institutions. It features a centralised data storage (with mirror sites) and access via the Internet. Participating collections can download their data at any time in a comma-separated values (CSV) format. From the beginning, JACQ was conceived as a collaboration platform for objects housed in botanical collections, i.e., plant, fungal and algal groups. For these groups, various sources of taxonomic reference exist, nowadays online resources are preferred, e.g., Catalogue of Life, AlgaeBase, Index Fungorum, Mycobank, Tropicos, Plants of the World Online, International Plant Names Index (IPNI), World Flora Online, Euro+Med, Anthos, Flora of Northamerica, REFLORA, Flora of China, Flora of Cuba, Australian Virtual Herbarium (AVH). Implementation and (re)use of PIDs Persistent identifiers (PIDs) for names (at any taxonomic rank) apart from PIDs for taxa, are essential to allow and support reliable referencing across institutions and thematic research networks (Agosti et al. 2022). For this purpose we have integrated referencing to several of the above mentioned resources and populate the names used inside JACQ with those external PIDs. For example, Salix rosmarinifolia is accepted in Plants of the World Online while Euro+Med Plantbase considers it a synonym of Salix repens subsp. rosmarinifolia. Either one can be an identification of a specimen in the JACQ database. Retrieval of collection material One strong use case is the curation of material in historic collections. On the basis of outdated taxon concepts that were applied to the material in history, \"old\" synonyms are omnipresent in historical collections. In order to retrieve all material of a given taxon, it is necessary to know all relevant names. Future outlook In combination with the capability of Linked Data and the IIIF (International Image Interoperability Framework) technology, these PIDs serve as crucial elements for the integration of decentralized information systems and reuse of (global) taxonomic backbones in combination with collection management systems (Gamer and Kreyenbühl 2022, Hyam 2022, Loh 2017).","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135879283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline 基于变换的信息提取管道从文本中提取生殖条件和栖息地信息

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112505

Roselyn Gabud, Nelson Pampolina, Vladimir Mariano, Riza Batista-Navarro

{"title":"Extracting Reproductive Condition and Habitat Information from Text Using a Transformer-based Information Extraction Pipeline","authors":"Roselyn Gabud, Nelson Pampolina, Vladimir Mariano, Riza Batista-Navarro","doi":"10.3897/biss.7.112505","DOIUrl":"https://doi.org/10.3897/biss.7.112505","url":null,"abstract":"Understanding the biology underpinning the natural regeneration of plant species in order to make plans for effective reforestation is a complex task. This can be aided by providing access to databases that contain long-term and wide-scale geographical information on species distribution, habitat, and reproduction. Although there exists widely-used biodiversity databases that contain structured information on species and their occurrences, such as the Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA), the bulk of knowledge about biodiversity still remains embedded in textual documents. Unstructured information can be made more accessible and useful for large-scale studies if there are tools and services that automatically extract meaningful information from text and store it in structured formats, e.g., open biodiversity databases, ready to be consumed for analysis (Thessen et al. 2022). We aim to enrich biodiversity occurrence databases with information on species reproductive condition and habitat, derived from text. In previous work, we developed unsupervised approaches to extract related habitats and their locations, and related reproductive condition and temporal expressions (Gabud and Batista-Navarro 2018). We built a new unsupervised hybrid approach for relation extraction (RE), which is a combination of classical rule-based pattern-matching methods and transformer-based language models that framed our RE task as a natural language inference (NLI) task. Using our hybrid approach for RE, we were able to extract related biodiversity entities from text even without a large training dataset. In this work, we implement an information extraction (IE) pipeline comprised of a named entity recognition (NER) tool and our hybrid relation extraction (RE) tool. The NER tool is a transformer-based language model that was pretrained on scientific text and then fine-tuned using COPIOUS (Conserving Philippine Biodiversity by Understanding big data; Nguyen et al. 2019), a gold standard corpus containing named entities relevant to species occurrence. We applied the NER tool to automatically annotate geographical location, temporal expression and habitat information contained within sentences. A dictionary-based approach is then used to identify mentions of reproductive conditions in text (e.g., phrases such as \"fruited heavily\" and \"mass flowering\"). We then use our hybrid RE tool to extract reproductive condition - temporal expression and habitat - geographical location entity pairs. We test our IE pipeline on the forestry compendium available in the CABI Digital Library (Centre for Agricultural and Biosciences International), and show that our work enables the enrichment of descriptive information on reproductive and habitat conditions of species. This work is a step towards enhancing a biodiversity database with the inclusion of habitat and reproductive condition information extracted from text.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example 人工智能在鱼类图像分类中的应用——以iDigBio数据集为例

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112438

Bahadir Altintas, Yasin Bakış, Xiojun Wang, Henry Bart

{"title":"Application of AI-Helped Image Classification of Fish Images: An iDigBio dataset example","authors":"Bahadir Altintas, Yasin Bakış, Xiojun Wang, Henry Bart","doi":"10.3897/biss.7.112438","DOIUrl":"https://doi.org/10.3897/biss.7.112438","url":null,"abstract":"Artificial Intelligence (AI) becomes more prevalent in data science as well as in areas of computational science. Commonly used classification methods in AI can also be used for unorganized databases, if a proper model is trained. Most of the classification work is done on image data for purposes such as object detection and face recognition. If an object is well detected from an image, the classification may be done to organize image data. In this work, we try to identify images from an Integrated Digitized Biocollections (iDigBio) dataset and to classify these images to generate metadata to use as an AI-ready dataset in the future. The main problem of the museum image datasets is the lack of metadata information on images, wrong categorization, or poor image quality. By using AI, it maybe possible to overcome these problems. Automatic tools can help find, eliminate or fix these problems. For our example, we trained a model for 10 classes (e.g., complete fish, photograph, notes/labels, X-ray, CT (computerized tomotography) scan, partial fish, fossil, skeleton) by using a manually tagged iDigBio image dataset. After training a model for each for class, we reclassified the dataset by using these trained models. Some of the results are given in Table 1. As can be seen in the table, even manually classified images can be identified as different classes, and some classes are very similar to each other visually such as CT scans and X-rays or fossils and skeletons. Those kind of similarities are very confusing for the human eye as well as AI results.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An AI-based Wild Animal Detection System and Its Application 基于人工智能的野生动物检测系统及其应用

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112456

Congtian Lin, Jiangning Wang, Liqiang Ji

{"title":"An AI-based Wild Animal Detection System and Its Application","authors":"Congtian Lin, Jiangning Wang, Liqiang Ji","doi":"10.3897/biss.7.112456","DOIUrl":"https://doi.org/10.3897/biss.7.112456","url":null,"abstract":"Rapid accumulation of biodiversity data and development of deep learning methods bring the opportunities for detecting and identifying wild animals automatically, based on artificial intelligence. In this paper, we introduce an AI-based wild animal detection system. It is composed of acoustic and image sensors, network infrastructures, species recognition models, and data storage and visualization platform, which go through the technical chain learned from Internet of Things (IOT) and applied to biodiversity detection. The workflow of the system is as follows: Deploying sensors for different detection targets . The acoustic sensor is composed of two microphones for picking up sounds from the environment and an edge computing box for judging and sending back the sound files. The acoustic sensor is suitable for monitoring birds, mammals, chirping insects and frogs. The image sensor is composed of a high performance camera that can be controlled to record surroundings automatically and a video analysis edge box running a model for detecting and recording animals. The image sensor is suitable for monitoring waterbirds in locations without visual obstructions. Adopting different networks according to signal availability . Network infrastructures are critical for the detection system and the task of transferring data collected by sensors. We use the existing network when 4/5G signals are available, and build special networks using Mesh Networking technology for the areas without signals. Multiple network strategies lower the cost for monitoring jobs. Recognizing species from sounds, images or videos . AI plays a key role in our system. We have trained acoustic models for more than 800 Chinese birds and some common chirping insects and frogs, which can be identified from sound files recorded by acoustic sensors. For video and image data, we also have trained models for recognizing 1300 Chinese birds and 400 mammals, which help to discover and count animals captured by image sensors. Moreover, we propose a special method for detecting species through features of voices, images and niche features of animals. It is a flexible framework to adapt to different combinations of acoustic and image sensors. All models were trained with labeled voices, images and distribution data from Chinese species database, ESPECIES. Saving and displaying machine observations . The original sound, image and video files with identified results were stored in the data platform deployed on the cloud for extensible computing and storage. We have developed visualization modules in the platform for displaying sensors on maps using WebGIS to show curves of the number of records and species for each day, real time alerts from sensors capturing animals, and other parameters. Deploying sensors for different detection targets . The acoustic sensor is composed of two microphones for picking up sounds from the environment and an edge computing box for judging and sending back the sound fil","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High Throughput Information Extraction of Printed Specimen Labels from Large-Scale Digitization of Entomological Collections using a Semi-Automated Pipeline 利用半自动化管道从昆虫馆藏大规模数字化中提取高通量印刷标本标签信息

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112466

Margot Belot, Leonardo Preuss, Joël Tuberosa, Magdalena Claessen, Olha Svezhentseva, Franziska Schuster, Christian Bölling, Théo Léger

{"title":"High Throughput Information Extraction of Printed Specimen Labels from Large-Scale Digitization of Entomological Collections using a Semi-Automated Pipeline","authors":"Margot Belot, Leonardo Preuss, Joël Tuberosa, Magdalena Claessen, Olha Svezhentseva, Franziska Schuster, Christian Bölling, Théo Léger","doi":"10.3897/biss.7.112466","DOIUrl":"https://doi.org/10.3897/biss.7.112466","url":null,"abstract":"Insects account for half of the total described living organisms on Earth, with a vast number of species awaiting description. Insects play a major role in ecosystems but are yet threatened by habitat destruction, intensive farming, and climate change. Museum collections around the world house millions of insect specimens and large-scale digitization initiatives, such as the digitization street digitize! at the Museum für Naturkunde, have been undertaken recently to unlock this data. Accurate and efficient extraction of insect specimen label information is vital for building comprehensive databases and facilitating scientific investigations, sustainability of the collected data, and efficient knowledge transfer. Despite the advancements in high-throughput imaging techniques for specimens and their labels, the process of transcribing label information remains mostly manual and lags behind the pace of digitization efforts. In order to address this issue, we propose a three step semi-automated pipeline that focuses on extracting and processing information from individual insect labels. Our solution is primarily designed for printed insect labels, as the OCR (optical character recognition) technology performs well for printed text while handwritten texts still yield mixed results. The pipeline incorporates computer vision (CV) techniques, OCR, and a clustering algorithm. The initial stage of our pipeline involves image analysis using a convolutional neural network (CNN) model. The model was trained using 2100 images from three distinct insect label datasets, namely AntWeb (ant specimen labels from various collections), Bees & Bytes (bee specimen labels from the Museum für Naturkunde), and LEP_PHIL (Lepidoptera specimen labels from the Museum für Naturkunde). The first model enables the identification and isolation of single labels within an image, effectively segmenting the label region from the rest of the image, and crops them into multiple new, single-label image files. It also assigns the labels to different classes, i.e., printed text or handwritten, with handwritten labels sorted out from the printed ones. In the second step, labels classified as “printed” are then parsed by an OCR engine to extract the text information from the labels. Tesseract and Google Vision OCRs were both tested to assess their performance. While Google Vision OCR is a cloud-based service with limited configurability, Tesseract provides the flexibility to fine-tune settings and enhance its performance for our specific use cases. In the third step, the OCR outputs are aggregated by similarity using a clustering algorithm. This step allows for the identification and formation of clusters that consist of labels sharing identical or highly similar content. Ultimately, these clusters are compared against a curated database of labels and are assigned to a known label or highlighted as new and manually added to the database. In order to assess the efficiency of our pipeline","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135982444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving Biological Collections Data through Human-AI Collaboration 通过人与人工智能协作改善生物收集数据

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112488

Alan Stenhouse, Nicole Fisher, Brendan Lepschi, Alexander Schmidt-Lebuhn, Juanita Rodriguez, Federica Turco, Emma Toms, Andrew Reeson, Cécile Paris, Pete Thrall

{"title":"Improving Biological Collections Data through Human-AI Collaboration","authors":"Alan Stenhouse, Nicole Fisher, Brendan Lepschi, Alexander Schmidt-Lebuhn, Juanita Rodriguez, Federica Turco, Emma Toms, Andrew Reeson, Cécile Paris, Pete Thrall","doi":"10.3897/biss.7.112488","DOIUrl":"https://doi.org/10.3897/biss.7.112488","url":null,"abstract":"Biological collections play a crucial role in our understanding of biodiversity and inform research in areas such as biosecurity, conservation, human health and climate change. In recent years, the digitisation of biological specimen collections has emerged as a vital mechanism for preserving and facilitating access to these invaluable scientific datasets. However, the growing volume of specimens and associated data presents significant challenges for curation and data management. By leveraging human-Artificial Intelligence (AI) collaborations, we aim to transform the way biological collections are curated and managed, unlocking their full potential in addressing global challenges. We present our initial contribution to this field through the development of a software prototype to improve metadata extraction from digital specimen images in biological collections. The prototype provides an easy-to-use platform for collaborating with web-based AI services, such as Google Vision and OpenAI Generative Pre-trained Transformer (GPT) Large Language Models (LLM). We demonstrate its effectiveness when applied to herbarium and insect specimen images. Machine-human collaboration may occur at various points within the workflows and can significantly affect outcomes. Initial trials suggest that the visual display of AI model uncertainty could be useful during expert data curation. While much work remains to be done, our results indicate that collaboration between humans and AI models can significantly improve the digitisation rate of biological specimens and thereby enable faster global access to this vital data. Finally, we introduce our broader vision for improving biological collection curation and management using human-AI collaborative methods. We explore the rationale behind this approach and the potential benefits of adding AI-based assistants to collection teams. We also examine future possibilities and the concept of creating 'digital colleagues' for seamless collaboration between human and digital curators. This ‘collaborative intelligence’ will enable us to make better use of both human and machine capabilities to achieve the goal of unlocking and improving our use of these vital biodiversity data to tackle real-world problems.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Swedish Biodiversity Data Infrastructure (SBDI): Insights from the Swedish ALA installation 瑞典生物多样性数据基础设施(SBDI):来自瑞典ALA装置的见解

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112429

Margret Steinthorsdottir, Veronika Johansson, Manash Shah

{"title":"Swedish Biodiversity Data Infrastructure (SBDI): Insights from the Swedish ALA installation","authors":"Margret Steinthorsdottir, Veronika Johansson, Manash Shah","doi":"10.3897/biss.7.112429","DOIUrl":"https://doi.org/10.3897/biss.7.112429","url":null,"abstract":"The Swedish Biodiversity Data Infrastructure (SBDI) is a biodiversity informatics infrastructure and is the key national resource for data-driven biodiversity and ecosystems research. SBDI rests on three pillars: mobilisation and access to biodiversity data; development and operation of tools for analysing these data; and user support. SBDI is funded by the Swedish Research Council (VR) and eleven of Sweden’s major universities and research government authorities (Fig. 1). mobilisation and access to biodiversity data; development and operation of tools for analysing these data; and user support. SBDI is funded by the Swedish Research Council (VR) and eleven of Sweden’s major universities and research government authorities (Fig. 1). SBDI was formed in early 2021 and represents the final step in an amalgamation of national infrastructures for biodiversity and ecosystems research. SBDI includes the Swedish node of the Global Biodiversity Information Facility (GBIF), the key international infrastructure for sharing biodiversity data. SBDI's predecessor Biodiversity Atlas Sweden (BAS) was an early adopter of the Atlas of Living Australia (ALA) platform. SBDI pioneered the container-based deployment of the platform using Docker and Docker Swarm. This container-based approach helps simplify deployment of the platform, which is characterised by a microservice architecture with loosely coupled services. This enables scalability, modularity, integration of services, and new technology insertions. SBDI has customised the BioCollect module to remove region-specific constraints so that it can be more readily improved for environmental monitoring in Sweden. To further support this, there are plans to develop services for the distribution of terrestrial map layers, which will provide important habitat information for artificial intelligence and machine learning research projects. The Amplicon Sequence Variants (ASVs) portal, an interface to sequence-based observations, is an example of integration and new technology insertion. The portal developed in SBDI and seamlessly integrated with the ALA platform provides basic functionalities for searching ASVs and occurrence records using the Basic Local Alignment Search Tool (BLAST) or filters on sequencing details and taxonomy and for submitting metabarcoding dataset Fig. 2. Future developments for SBDI include a continued focus on eDNA and monitoring data as well as the implementation of procedures for handling sensitive data.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135982102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Practice, Pathways and Lessons Learned from Building a Digital Data Flow with Tools: Focusing on alien invasive species, from occurrence via measures to documentation 用工具构建数字数据流的实践、途径和经验教训:关注外来入侵物种，从发生到措施到记录

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112337

Mora Aronsson, Malin Strand, Holger Dettki, Hanna Illander, Johan Olsson

{"title":"Practice, Pathways and Lessons Learned from Building a Digital Data Flow with Tools: Focusing on alien invasive species, from occurrence via measures to documentation","authors":"Mora Aronsson, Malin Strand, Holger Dettki, Hanna Illander, Johan Olsson","doi":"10.3897/biss.7.112337","DOIUrl":"https://doi.org/10.3897/biss.7.112337","url":null,"abstract":"The SLU Swedish Species Information Centre (SSIC, SLU Artdatabanken) accumulates, analyses and disseminates information concerning species and habitats occurring in Sweden. The SSIC provides an open access biodiversity reporting and analysis infrastructure including the Swedish Species Observation System, the Swedish taxonomic backbone Dyntaxa, and tools for species information including traits, terminology, quality assurance and species identification.*1 The content is available to scientists, conservationists and the public. All systems, databases, APIs and web applications, rely on recognized standards to ensure interoperability. The SSIC is a leading partner within the Swedish Biodiversity Data Infrastructure (SBDI). Here we present a data flow (Fig. 1) that exemplifies the strengthening of the cooperation and transfer of experiences between research, community, non-governmental organizations (NGOs), citizen science and governmental agencies, and also presents solutions to current data challenges (e.g., data fragmentation, taxonomic issues or platform relations). This data flow aimed to facilitate the process for evaluating and understanding the distribution and spread of species (e.g., invasive alien species). It provides Findable, Accessible, Interoperable and Reusable (FAIR) data and links related information between different parties such as universities, NGOs, county administrative boards (CABs) and environmental protection agencies (EPAs). The digital structure is built on the national Swedish taxonomic backbone Dyntaxa, which prevents data fragmentation due to taxonomic issues and acts as a common standard for all users. The chain of information contains systems, tools and a linked data flow for reporting observations, verification procedures, and it can work as an early warning system for surveillance regarding certain species. After an observation is reported, an alert can be activated, field checks can be carried out, and if necessary, eradication measures can be activated. The verification tool that traditionally has been focused on the quality of species identification has been improved, providing verification of geographic precision. This is equally important for eradication actions as is species accuracy. A digital catalogue of eradication methods is in use by the CABs but there are also recommendations on methods for ‘public’ use, and collaboration between Invasive Alien Species (IAS) coordinators in regional CABs is currently being developed. The CABs have a separate tool for documentation of eradication measures and, if/when measures are carried out (by CABs), this information can be fed back from the CAB-tool into the database in SSIC where it is possible to search for, and visualize, this information. Taxonomic integrity over time should be intact and related to the taxon identifier (ID) provided by Dyntaxa. However, metadata, such as geographic position, date, verification status, mitigation results, etc., will be fully us","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Documenting Biodiversity in Underrepresented Languages using Crowdsourcing 用众包方法记录代表性不足语言的生物多样性

Biodiversity Information Science and Standards Pub Date : 2023-09-11 DOI: 10.3897/biss.7.112431

Mohammed Kamal-Deen Fuseini, Agnes Abah, Andra Waagmeester

{"title":"Documenting Biodiversity in Underrepresented Languages using Crowdsourcing","authors":"Mohammed Kamal-Deen Fuseini, Agnes Abah, Andra Waagmeester","doi":"10.3897/biss.7.112431","DOIUrl":"https://doi.org/10.3897/biss.7.112431","url":null,"abstract":"Biodiversity is the variety of life on Earth, and it is essential for our planet's health and well-being. Language is also a powerful medium for documenting and preserving cultural heritage, including knowledge about biodiversity. However, many indigenous and underrepresented languages are at risk of disappearing, taking with them valuable information about local ecosystems. Also, many species are at risk of extinction, and much of our knowledge about biodiversity is in underrepresented languages. (Cardoso et al. 2019). This can make it challenging to document and protect biodiversity, as well as to share this knowledge with others. Crowdsourcing is a way to collect information from a large number of people, and it can be a valuable tool for documenting biodiversity in underrepresented languages. By crowdsourcing, leveraging the iNaturalist platform, and volunteer contributors in the open movement including the Dagbani*1 and Igbo*2 Wikimedian communities, we can reach people who have knowledge about local biodiversity, but who may not have been able to share this knowledge before. For instance, the Dagbani and Igbo Wikimedia contributors did not have enough content on biodiversity data until they received education about the need. This can help us to fill in the gaps in our knowledge about biodiversity, and to protect species that are at risk of extinction. In this presentation, we will discuss the use of crowdsourcing to document biodiversity in underrepresented languages, the challenges and opportunities of using crowdsourcing for this purpose, and some examples of successful projects. We will also discuss the importance of sharing knowledge about biodiversity with others and share some ideas on how to do this. We believe that crowdsourcing has the potential to be a powerful tool for documenting biodiversity in underrepresented languages. By working together, we can help protect our planet's biodiversity and ensure that this knowledge is available to future generations.","PeriodicalId":9011,"journal":{"name":"Biodiversity Information Science and Standards","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135980578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0