GigaSciencePub Date : 2026-05-08DOI: 10.1093/gigascience/giag048
Jia-Yuan Zhang, Changjiu Miao, Teng Qiu, Junyi He, Wenqi Cao, Wei Lin, Xiaoshuang Xia, Lei He, Chunlei Yang, Yuhui Sun, Tao Zeng, Yuxiang Li, Xun Xu, Yijun Ruan, Yuliang Dong
{"title":"FEDRANN: effective long-read overlap detection based on dimensionality reduction and approximate nearest neighbors.","authors":"Jia-Yuan Zhang, Changjiu Miao, Teng Qiu, Junyi He, Wenqi Cao, Wei Lin, Xiaoshuang Xia, Lei He, Chunlei Yang, Yuhui Sun, Tao Zeng, Yuxiang Li, Xun Xu, Yijun Ruan, Yuliang Dong","doi":"10.1093/gigascience/giag048","DOIUrl":"https://doi.org/10.1093/gigascience/giag048","url":null,"abstract":"<p><p>Overlap detection is a key step in de novo genome assembly pipelines based on the Overlap-Layout-Consensus (OLC) paradigm. Existing methods for overlap detection either rely on heuristic seed-and-extension strategies or locality-sensitive hashing (LSH), both of which struggle to handle repetitive genomic regions and the computational burden of large-scale datasets. Here, we present FEDRANN, a novel strategy for overlap graph construction that integrates feature extraction, dimensionality reduction (DR), and approximate nearest neighbor (ANN) search. We find the pipeline combining inverse document frequency (IDF) transformation, sparse random projection (SRP), and NNDescent enables accurate detection of overlaps across diverse datasets. We developed an efficient open-source implementation of this pipeline named Fedrann (https://github.com/jzhang-dev/fedrann). Through systematic benchmarking on real long-read sequencing data, we demonstrate that Fedrann produces overlap graphs comparable to or better than those generated by existing state-of-the-art tools, including MECAT2, minimap2, and wtdbg2, while maintaining competitive runtime. By integrating Fedrann into the Shasta assembler, we successfully reconstructed human whole genomes, achieving high assembly contiguity and quality. Despite being implemented primarily in Python, Fedrann achieves performance parity with tools written in compiled languages by leveraging C-accelerated numerical libraries and optimized batch-based matrix operations. Our results suggest that the combination of dimensionality reduction and ANN techniques offers a robust, scalable framework for accurate overlap detection in long-read assembly and broader sequence similarity search tasks.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147856134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-05-07DOI: 10.1093/gigascience/giag054
Elio Escamilla-Vega, Ann-Katrin Koch, Louk W G Seton, Andrea P Murillo-Rincón, Stella Kyomen, Jörg U Hammel, Timo Moritz, Markéta Kaucká
{"title":"Synchrotron radiation micro-computed tomography of the small-spotted catshark embryonic development (Chondrichthyes: Scyliorhinus canicula).","authors":"Elio Escamilla-Vega, Ann-Katrin Koch, Louk W G Seton, Andrea P Murillo-Rincón, Stella Kyomen, Jörg U Hammel, Timo Moritz, Markéta Kaucká","doi":"10.1093/gigascience/giag054","DOIUrl":"https://doi.org/10.1093/gigascience/giag054","url":null,"abstract":"<p><strong>Background: </strong>Sharks occupy a key position on vertebrate phylogeny, making them essential for understanding the early origins of jawed vertebrates (gnathostomes) and the functional adaptation of vertebrate traits like jaws or complex sensory systems. As such, sharks are important model organisms in evolutionary developmental biology (evo-devo), but sparse data and limited availability of samples hinder their inclusion in contemporary evo-devo research. The knowledge of their distinctive morphology will not only shed light on the anatomical architecture and physiology of basal living gnathostomes but also reveal the evolutionary divergence of developmental processes that establish the foundational vertebrate blueprint.</p><p><strong>Findings: </strong>We performed synchrotron radiation micro-computed tomography (SRµCT) scanning of the small-spotted catshark (Scyliorhinus canicula) embryonic development, spanning from gastrulation (stage 12) to late-organogenesis (stage 31), enhanced by tissue contrasting with phosphotungstic acid. We obtained 36 whole-embryo scans that encompass the formation of key embryonic structures, such as sensory organs, fins, muscles and skeletal elements. The achieved resolution allows for segmentation of all tissue types and both internal and external structures.</p><p><strong>Conclusions: </strong>We present a comprehensive dataset of 4D high-resolution SRµCT of the small-spotted catshark embryonic development. The dataset spans consecutive embryonic stages, allowing the reconstruction and morphometric analyses of tissues, organs, and structures, along with the tracking of their development. The deposited data are publicly available, and provide a valuable resource for comparative research, additionally allowing the identification of conserved and derived developmental processes and features and understanding the evolution of vertebrates.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147836687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-05-07DOI: 10.1093/gigascience/giag053
Sven Hauns, Frederico G Pinto, Costerwell Khyriem, Ankita Singh, Azzat Al-Sadi, Talal Al Yazeedi, Rasheed Mohammad, Babacar Cisse, Timothy J Garrett, Mohammed Uddin, Nelson C Soares, Rolf Backofen, Omer S Alkhnbashi
{"title":"Autoencoder/RandomForest-TabPFN for Cross-Cancer Metabolomics: Prostate and Breast Cancer Diagnosis Using Paper Spray and Ion Mobility-Mass Spectrometry Techniques.","authors":"Sven Hauns, Frederico G Pinto, Costerwell Khyriem, Ankita Singh, Azzat Al-Sadi, Talal Al Yazeedi, Rasheed Mohammad, Babacar Cisse, Timothy J Garrett, Mohammed Uddin, Nelson C Soares, Rolf Backofen, Omer S Alkhnbashi","doi":"10.1093/gigascience/giag053","DOIUrl":"https://doi.org/10.1093/gigascience/giag053","url":null,"abstract":"<p><p>Accurate and rapid disease diagnosis, particularly in prostate cancer (PC) and breast cancer (BC), is critical for early intervention and improved patient outcomes. Metabolomic signatures represent a robust molecular framework for elucidating cancer-associated biochemical reprogramming. The use of Artificial Intelligence (AI) in biology in recent years has become widespread and promising. This study introduces a novel predictive method that integrates an Autoencoder, random forest-based feature selection and Tabular Prior-data Fitted Network (TabPFN) to achieve high diagnostic accuracy from metabolomics data of prostate and breast cancer patients. The datasets were acquired using Paper Spray Ionization Mass Spectrometry (PSI-MS) and Flow Injection;Traveling-Wave Ion Mobility-Mass Spectrometry (FI-TWIM-MS) of individuals diagnosed with PC and BC. When leveraging metabolomic profiling data from two distinct sources, prostate cancer urine and serum samples, the proposed model achieved an accuracy up to 98.75% in distinguishing diseased from healthy conditions. Additionally, we employed a breast cancer dataset containing metabolic and lipidomic signatures acquired from core needle biopsies using a miniature MS platform coupled with PSI to assess the fidelity of our implementation across distinct cancer types. Our results on a well-characterized targeted dataset show that we can effectively reduce high-dimensional data into latent feature representations. At the same time, TabPFN captures tumor progression-related changes and feature interaction, thereby enhancing the possibility that the model will be a highly potent and effective tool for stage-specific diagnostic precision. Most existing machine learning approaches for disease diagnosis primarily rely on imaging, genomics, or clinical parameters, often overlooking the critical role of metabolites in identifying disease-specific biochemical signatures. By integrating metabolite-specific data with a robust deep-learning approach, this study demonstrates the transformative potential of AI in metabolomics-based diagnostics. The proposed model offers scalability and versatility, with applications extending beyond oncology to a much broader disease profiling aspect. These findings emphasise the value of combining multi-source metabolomic data with deep learning to advance personalised medicine and enhance diagnostic efficiency in clinical practice.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147836684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-05-05DOI: 10.1093/gigascience/giag052
{"title":"Unlocking the Potential of Asian Genomic Data: A Collaborative Framework for Precision Medicine Innovation.","authors":"","doi":"10.1093/gigascience/giag052","DOIUrl":"https://doi.org/10.1093/gigascience/giag052","url":null,"abstract":"<p><p>Asian genomic datasets possess unparalleled potential to advance global understanding of human genetic diversity. Encompassing the world's largest population pool with diverse ethnicities, these datasets capture comprehensive genomic variations shaped by heterogeneous socioeconomic conditions, climate exposures, and clinical environments. However, current national genome initiatives across Asia demonstrate substantial disunity, stemming from limited cross-border communication and collaborative infrastructure, thereby diminishing their collective impact on biomedical research and precision medicine development. The MedHackathon Asia 2025 catalyzed crucial dialogues toward establishing a regional community dedicated to three pillars: harmonized biobank collaboration, standardized genomic data protocols, and cooperative governance frameworks. This multidisciplinary convening brought together researchers, clinicians, bioinformaticians, and national precision medicine program leaders from across Asia to share best practices, identify implementation challenges, and formulate foundational strategies for sustained cooperation. This community review synthesizes critical outcomes from these deliberations, emphasizing the imperative for continuous regional collaboration while advocating for the development of sustainable architectures enabling: (1) equitable biobank resource sharing, (2) genomic data standardization, and (3) ethical governance models. Through consolidation and expansion of this emerging network, Asian nations are expected to lead transformative contributions to global genomic science while ensuring appropriate representation in biomedical innovation. Such coordinated efforts promise to accelerate healthcare advancements with equitable benefits extending throughout the region and worldwide.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147836650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-04-29DOI: 10.1093/gigascience/giag051
Disha Tandon, Tarcisio Mendes De Farias, Pierre-Marie Allard, Emmanuel Defossez
{"title":"METRIN-KG: A knowledge graph integrating plant metabolites, traits, and biotic interactions.","authors":"Disha Tandon, Tarcisio Mendes De Farias, Pierre-Marie Allard, Emmanuel Defossez","doi":"10.1093/gigascience/giag051","DOIUrl":"https://doi.org/10.1093/gigascience/giag051","url":null,"abstract":"<p><strong>Background: </strong>In recent years, biodiversity data management has emerged as a critical pillar in global conservation efforts. Today, the ability to efficiently collect, structure, and analyze biodiversity data is central to breakthroughs in conservation, drug development, disease monitoring, ecological forecasting, and agri-tech innovation. However, due to the vastness and heterogeneity of biodiversity data, it is often confined to databases for specific research areas in isolated formats and disconnected from other relevant resources. Crucial components of such data in kingdom Plantae comprise of metabolomes-the vast array of compounds produced by plants; traits-measurable characteristics of plants that influence their growth, survival, and reproduction, and that affect ecosystem processes; and biotic interactions-relationships of plants with other living organisms, affecting the ecosystem functions.</p><p><strong>Results: </strong>In this work, we present METRIN-KG (MEtabolomes, TRaits, and INteractions-Knowledge Graph) a powerful data resource simplifying the integration of diverse and heterogeneous data resources such as plant metabolomes, traits, and biotic interactions.</p><p><strong>Conclusions: </strong>The proposed knowledge graph provides an interface to interactively search for data relating plant metabolomes, traits, and interactions. This, in turn, will facilitate development of research questions in life-sciences. In this context, we provide representative case studies on how to frame queries that can be used to search for relevant data in the knowledge graph.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147769038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-04-20DOI: 10.1093/gigascience/giag049
Yunyun Fu, Yi Liu, Mingwen Xu, Gaojun Liu, Jianzhi Sun, Fanyu Bu, Wenqing Xie, Jiayi Zhao, Jun Luo, Qiang Guo, Yinghua Huang, Fengping Xu, Siqi Liu, Longqi Liu, Ying Fu, Xuan Dong
{"title":"Integrative single-cell transcriptomics and proteomics reveal an immunometabolic framework for MSC-exosome-mediated remodeling of expanded NK cells.","authors":"Yunyun Fu, Yi Liu, Mingwen Xu, Gaojun Liu, Jianzhi Sun, Fanyu Bu, Wenqing Xie, Jiayi Zhao, Jun Luo, Qiang Guo, Yinghua Huang, Fengping Xu, Siqi Liu, Longqi Liu, Ying Fu, Xuan Dong","doi":"10.1093/gigascience/giag049","DOIUrl":"https://doi.org/10.1093/gigascience/giag049","url":null,"abstract":"<p><strong>Background: </strong>Natural killer (NK) cells play a central role in anti-tumor immunity and immunosurveillance of senescence, yet their clinical performance is frequently limited by functional exhaustion during ex vivo expansion. Mesenchymal stem cell-derived exosomes (MSC-Exos) are increasingly recognized as immunomodulators, but their broader effects on NK cell fitness and functional states remain incompletely characterized.</p><p><strong>Results: </strong>Here, we assessed MSC-Exos-mediated regulation of human NK cells using a standardized ex vivo priming platform integrated with single-cell transcriptomics and proteomic profiling. MSC-Exos significantly improved NK cell viability in a dose- and time-dependent manner while preserving a CD56⁺CD3⁻ NK-cell-enriched phenotype. MSC-Exos-treated NK cells showed enhanced cytotoxicity against K562 tumor cells and senescent fibroblasts. This phenotype was accompanied by increased expression of the activating receptors NKG2D and CD16, reduced LAG3 expression, and enhanced granzyme B expression and degranulation. Consistent with improved NK cell fitness, MSC-Exos treatment was also associated with upregulated expression of genes involved in NRF2-linked redox programs and improved mitochondrial readouts in NK cells. Single-cell analyses of MSC-Exos-treated NK cells revealed enhanced immune-effector programs and reduced inflammatory stress, while trajectory inference indicated that MSC-Exos may bias the NK cell state distribution toward more cytotoxic effector-like states. Proteomic profiling of MSC-Exos identified enrichment of FcγR-associated signaling components, supporting the hypothesis that exosomal composition may be related to the FcγR/CD16-associated transcriptional and phenotypic features observed in MSC-Exos-treated NK cells.</p><p><strong>Conclusions: </strong>Our data indicate that MSC-Exos improve NK cell viability and functional fitness during ex vivo expansion and bias NK cells toward a more effector-cytotoxic state. Together, these findings provide an immunometabolic framework for MSC-Exos-assisted NK cell manufacturing, while underscoring the need for further causal validation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147729001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-04-16DOI: 10.1093/gigascience/giag046
Xiaomin Zhang, Hongjie Liu, Siyue Xu, Shuang Zhang, Tingting Yang, Zhongyi Lei, Weili Xu, Xiaochen Bo, Chenghai Yang, Ming Ni
{"title":"Within-host diversity and phased variant analysis reveal structures and recombination of Helicobacter pylori subpopulations in stomach.","authors":"Xiaomin Zhang, Hongjie Liu, Siyue Xu, Shuang Zhang, Tingting Yang, Zhongyi Lei, Weili Xu, Xiaochen Bo, Chenghai Yang, Ming Ni","doi":"10.1093/gigascience/giag046","DOIUrl":"https://doi.org/10.1093/gigascience/giag046","url":null,"abstract":"<p><p>Helicobacter pylori (H. pylori) has a highly plastic genome and can generate substantial within-host diversity during chronic gastric colonization. However, the delineation of its within-host subpopulations, particularly regarding the emergence and spread of antibiotic resistance-conferring mutations, remains poorly understood. In this study, we enrolled 25 chronic gastritis patients from southern China, collecting multiple isolates from distinct gastric regions. Among them, 14 patients exhibited heterogeneity in antibiotic susceptibility across isolates (heteroresistant), while the remaining 11 showed consistent profiles (homoresistant). Using ultra-deep short- and long-read sequencing, we showed that co-existing H. pylori subpopulations were prevalent in these patients, particularly within the same anatomical niche. Two patients presented mixed infections involving different strains as subpopulations, while others exhibited microevolution from a common ancestor. We reconstructed the subpopulation structures and found that isolates from heteroresistant patients had greater within-host diversity compared to these from homoresistant patients. Notably, subpopulations in the antrum demonstrated higher diversity than those in the gastric corpus and incisura angularis. Through a custom-developed phasing bioinformatics workflow, we resolved subpopulation-level genomic regions and directly observed extensive homologous recombination among them. Importantly, we traced the distribution of levofloxacin- and clarithromycin-associated resistance mutations across subpopulations, which was mainly mediated by recombination. To our knowledge, this study provides the first detailed depiction of H. pylori subpopulation distribution within the human stomach, illustrating how recombination drives within-host diversification and contributed to the spread of antibiotic resistance mutations.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147698556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GigaDB: A redesigned repository for data publishing and management.","authors":"Xiaoqiang Li, Cong Hua, Qian Yue, Zhiyong Li, Jiawei Tong, Ziheng Luo, Tao Yang, Lijin You, Hongfang Zhang, Dongni Ma, Xiaofeng Wei, Hongling Zhou","doi":"10.1093/gigascience/giag047","DOIUrl":"https://doi.org/10.1093/gigascience/giag047","url":null,"abstract":"<p><p>GigaDB is a repository that links research articles with the underlying datasets, software, and metadata, helping to support open and reproducible research. As of February 2026, it contains 2,710 published datasets, covering 90.56 TB of data. Over the past year, the platform has been rebuilt to better support the growing scale of the repository and to improve data submission, management, discoverability, and reuse. The updated system is organized around four core modules: role-based permission control, dataset management, workflow management, and dataset retrieval. These improvements make submission and curation more efficient and transparent, strengthen access to published data and related resources, and provide a strong foundation for the future development of GigaDB services.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147698487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning inherent genetic patterns and trait associations with deep generative models for discrete genotype simulation.","authors":"Sihan Xie, Thierry Tribout, Didier Boichard, Blaise Hanczar, Julien Chiquet, Eric Barrey","doi":"10.1093/gigascience/giag044","DOIUrl":"https://doi.org/10.1093/gigascience/giag044","url":null,"abstract":"<p><strong>Background: </strong>Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data.</p><p><strong>Results: </strong>We developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep learning and quantitative genetics literature. Our results show that these models can effectively capture genetic patterns and preserve genotype-phenotype association.</p><p><strong>Conclusions: </strong>As deep generative models are able to reproduce key characteristics of genotype data, they can serve as direct tools for genotype-phenotype simulation, while also enabling privacy-preserving data sharing. Our findings provide a comprehensive evaluation of these models and offer practical guidance for future research in genotype simulation.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147689684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
GigaSciencePub Date : 2026-04-09DOI: 10.1093/gigascience/giag038
Georgios K Georgakilas, Brett Metcalfe, Ariane Bize, Matthew Crowther, Emilie Fernandez, Susana Maria Alonso Villela, Stuart Owen, Rudolf Wittner, David Camilo Corrales, Anselm von Gladiss, Peter Blomberg, Munazah Andrabi, Cesar Arturo Aceves Lara, Hans Mattila, Marily Wiebe, Theodore Dalamagas, Jasper J Koehorst
{"title":"MIFE and MIFD: Minimum information for fermentation experiments and devices.","authors":"Georgios K Georgakilas, Brett Metcalfe, Ariane Bize, Matthew Crowther, Emilie Fernandez, Susana Maria Alonso Villela, Stuart Owen, Rudolf Wittner, David Camilo Corrales, Anselm von Gladiss, Peter Blomberg, Munazah Andrabi, Cesar Arturo Aceves Lara, Hans Mattila, Marily Wiebe, Theodore Dalamagas, Jasper J Koehorst","doi":"10.1093/gigascience/giag038","DOIUrl":"https://doi.org/10.1093/gigascience/giag038","url":null,"abstract":"<p><strong>Background: </strong>As the technological advancements of the early 21st century are pushing industrial biotechnology (IB) into the realm of Big Data driven innovation, the requirement for trustworthy data management, annotation and standardization is emerging as a necessity. Minimum information models (MIMs) have long been used across disciplines as the backbone of good data management practices by providing the scaffold upon which standardized recording of metadata can adequately and succinctly describe an under-study phenomenon.</p><p><strong>Findings: </strong>Here we present a minimum set of metadata, named the minimum information for fermentation experiments (MIFE) and devices (MIFD), that has been specifically designed to accommodate the data management and annotation needs of IB related fermentation experiments. Although the proposed schema is tailored to IB applications, MIFE and MIFD builds upon well-established models and community standards to facilitate easier integration to existing infrastructure and easier adoption by the community, and aims to integrate Findable, Accessible, Interoperable and Reproducible (FAIR) principles in the IB field. In addition, the integration with FAIR Data Station (FAIR DS), a tool that offers metadata validation and enables the automated uptake of (meta)data from data management repositories such as FAIRDOM-SEEK, is showcased. The proposed models are accompanied by a Python package that enables their programmatic use by creating a Linked Data Modeling Language (LinkML) schema that can fuel subsequent analyses.</p><p><strong>Conclusions: </strong>Through the promotion and simplification of knowledge discovery we believe that MIFE and MIFD can accelerate the application of state-of-the-art artificial intelligence (AI) methods and the adoption of explainable AI (XAI) to better understand bioprocesses at scale.</p>","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":" ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147672281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}