GigaScience最新文献

An analysis of performance bottlenecks in MRI preprocessing. MRI预处理中的性能瓶颈分析。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae098

Mathieu Dugré, Yohan Chatelain, Tristan Glatard

{"title":"An analysis of performance bottlenecks in MRI preprocessing.","authors":"Mathieu Dugré, Yohan Chatelain, Tristan Glatard","doi":"10.1093/gigascience/giae098","DOIUrl":"10.1093/gigascience/giae098","url":null,"abstract":"Magnetic resonance imaging (MRI) preprocessing is a critical step for neuroimaging analysis. However, the computational cost of MRI preprocessing pipelines is a major bottleneck for large cohort studies and some clinical applications. While high-performance computing and, more recently, deep learning have been adopted to accelerate the computations, these techniques require costly hardware and are not accessible to all researchers. Therefore, it is important to understand the performance bottlenecks of MRI preprocessing pipelines to improve their performance. Using the Intel VTune profiler, we characterized the bottlenecks of several commonly used MRI preprocessing pipelines from the Advanced Normalization Tools (ANTs), FMRIB Software Library, and FreeSurfer toolboxes. We found few functions contributed to most of the CPU time and that linear interpolation was the largest contributor. Data access was also a substantial bottleneck. We identified a bug in the Insight Segmentation and Registration Toolkit library that impacts the performance of the ANTs pipeline in single precision and a potential issue with the OpenMP scaling in FreeSurfer recon-all. Our results provide a reference for future efforts to optimize MRI preprocessing pipelines.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11899568/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143614576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae110

Arseniy Lobov, Polina Kuchur, Nadezhda Boyarskaya, Daria Perepletchikova, Ivan Taraskin, Andrei Ivashkin, Daria Kostina, Irina Khvorova, Vladimir Uspensky, Egor Repkin, Evgeny Denisov, Tatiana Gerashchenko, Rashid Tikhilov, Svetlana Bozhkova, Vitaly Karelkin, Chunli Wang, Kang Xu, Anna Malashicheva

{"title":"Similar, but not the same: multiomics comparison of human valve interstitial cells and osteoblast osteogenic differentiation expanded with an estimation of data-dependent and data-independent PASEF proteomics.","authors":"Arseniy Lobov, Polina Kuchur, Nadezhda Boyarskaya, Daria Perepletchikova, Ivan Taraskin, Andrei Ivashkin, Daria Kostina, Irina Khvorova, Vladimir Uspensky, Egor Repkin, Evgeny Denisov, Tatiana Gerashchenko, Rashid Tikhilov, Svetlana Bozhkova, Vitaly Karelkin, Chunli Wang, Kang Xu, Anna Malashicheva","doi":"10.1093/gigascience/giae110","DOIUrl":"10.1093/gigascience/giae110","url":null,"abstract":"Osteogenic differentiation is crucial in normal bone formation and pathological calcification, such as calcific aortic valve disease (CAVD). Understanding the proteomic and transcriptomic landscapes underlying this differentiation can unveil potential therapeutic targets for CAVD. In this study, we employed RNA sequencing transcriptomics and proteomics on a timsTOF Pro platform to explore the multiomics profiles of valve interstitial cells (VICs) and osteoblasts during osteogenic differentiation. For proteomics, we utilized 3 data acquisition/analysis techniques: data-dependent acquisition (DDA)-parallel accumulation serial fragmentation (PASEF) and data-independent acquisition (DIA)-PASEF with a classic library-based (DIA) and machine learning-based library-free search (DIA-ML). Using RNA sequencing data as a biological reference, we compared these 3 analytical techniques in the context of actual biological experiments. We use this comprehensive dataset to reveal distinct proteomic and transcriptomic profiles between VICs and osteoblasts, highlighting specific biological processes in their osteogenic differentiation pathways. The study identified potential therapeutic targets specific for VICs osteogenic differentiation in CAVD, including the MAOA and ERK1/2 pathway. From a technical perspective, we found that DIA-based methods demonstrate even higher superiority against DDA for more sophisticated human primary cell cultures than it was shown before on HeLa samples. While the classic library-based DIA approach has proved to be a gold standard for shotgun proteomics research, the DIA-ML offers significant advantages with a relatively minor compromise in data reliability, making it the method of choice for routine proteomics.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11724719/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143055932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How to select predictive models for decision-making or causal inference. 如何为决策或因果推理选择预测模型。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf016

Matthieu Doutreligne, Gaël Varoquaux

{"title":"How to select predictive models for decision-making or causal inference.","authors":"Matthieu Doutreligne, Gaël Varoquaux","doi":"10.1093/gigascience/giaf016","DOIUrl":"10.1093/gigascience/giaf016","url":null,"abstract":"Background: We investigate which procedure selects the most trustworthy predictive model to explain the effect of an intervention and support decision-making.Methods: We study a large variety of model selection procedures in practical settings: finite samples settings and without a theoretical assumption of well-specified models. Beyond standard cross-validation or internal validation procedures, we also study elaborate causal risks. These build proxies of the causal error using \"nuisance\" reweighting to compute it on the observed data. We evaluate whether empirically estimated nuisances, which are necessarily noisy, add noise to model selection and compare different metrics for causal model selection in an extensive empirical study based on a simulation and 3 health care datasets based on real covariates.Results: Among all metrics, the mean squared error, classically used to evaluate predictive modes, is worse. Reweighting it with a propensity score does not bring much improvement in most cases. On average, the $Rtext{-risk}$, which uses as nuisances a model of mean outcome and propensity scores, leads to the best performances. Nuisance corrections are best estimated with flexible estimators such as a super learner.Conclusions: When predictive models are used to explain the effect of an intervention, they must be evaluated with different procedures than standard predictive settings, using the $Rtext{-risk}$ from causal inference.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The haplotype-resolved T2T genome for Bauhinia × blakeana sheds light on the genetic basis of flower heterosis. 紫荆T2T基因组的单倍型解析揭示了紫荆花杂种优势的遗传基础。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf044

Weixue Mu, Joshua Casey Darian, Wing-Kin Sung, Xing Guo, Tuo Yang, Mandy Wai Man Tang, Ziqiang Chen, Steve Kwan Hok Tong, Irene Wing Shan Chik, Robert L Davidson, Scott C Edmunds, Tong Wei, Stephen Kwok-Wing Tsui

{"title":"The haplotype-resolved T2T genome for Bauhinia × blakeana sheds light on the genetic basis of flower heterosis.","authors":"Weixue Mu, Joshua Casey Darian, Wing-Kin Sung, Xing Guo, Tuo Yang, Mandy Wai Man Tang, Ziqiang Chen, Steve Kwan Hok Tong, Irene Wing Shan Chik, Robert L Davidson, Scott C Edmunds, Tong Wei, Stephen Kwok-Wing Tsui","doi":"10.1093/gigascience/giaf044","DOIUrl":"https://doi.org/10.1093/gigascience/giaf044","url":null,"abstract":"Background: The Hong Kong orchid tree Bauhinia × blakeana Dunn has long been proposed to be a sterile interspecific hybrid exhibiting flower heterosis when compared to its likely parental species, Bauhinia purpurea L. and Bauhinia variegata L. Here, we report comparative genomic and transcriptomic analyses of the 3 Bauhinia species.Findings: We generated chromosome-level assemblies for the parental species and applied a trio-binning approach to construct a haplotype-resolved telomere-to-telomere (T2T) genome for B. blakeana. Comparative chloroplast genome analysis confirmed B. purpurea as the maternal parent. Transcriptome profiling of flower tissues highlighted a closer resemblance of B. blakeana to its maternal parent. Differential gene expression analyses revealed distinct expression patterns among the 3 species, particularly in biosynthetic and metabolic processes. To investigate the genetic basis of flower heterosis observed in B. blakeana, we focused on gene expression patterns within pigment biosynthesis-related pathways. High-parent dominance and overdominance expression patterns were observed, particularly in genes associated with carotenoid biosynthesis. Additionally, allele-specific expression analysis revealed a balanced contribution of maternal and paternal alleles in shaping the gene expression patterns in B. blakeana.Conclusions: Our study offers valuable insights into the genome architecture of hybrid B. blakeana, establishing a comprehensive genomic and transcriptomic resource for future functional genetics research within the Bauhinia genus. It also serves as a model for exploring the characteristics of hybrid species using T2T haplotype-resolved genomes, providing a novel approach to understanding genetic interactions and evolutionary mechanisms in complex genomes with high heterozygosity.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12012898/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143964846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overture: an open-source genomics data platform. Overture：一个开源基因组数据平台。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf038

Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung

{"title":"Overture: an open-source genomics data platform.","authors":"Mitchell Shiell, Rosi Bajari, Dusan Andric, Jon Eubank, Brandon F Chan, Anders J Richardsson, Azher Ali, Bashar Allabadi, Yelizar Alturmessov, Jared Baker, Ann Catton, Kim Cullion, Daniel DeMaria, Patrick Dos Santos, Henrich Feher, Francois Gerthoffert, Minh Ha, Robin A Haw, Atul Kachru, Alexandru Lepsa, Alexis Li, Rakesh N Mistry, Hardeep K Nahal-Bose, Aleksandra Pejovic, Samantha Rich, Leonardo Rivera, Ciarán Schütte, Edmund Su, Robert Tisma, Jaser Uddin, Chang Wang, Alex N Wilmer, Linda Xiang, Junjun Zhang, Lincoln D Stein, Vincent Ferretti, Mélanie Courtot, Christina K Yung","doi":"10.1093/gigascience/giaf038","DOIUrl":"https://doi.org/10.1093/gigascience/giaf038","url":null,"abstract":"Background: Next-generation sequencing has created many new technological challenges in organizing and distributing genomics datasets, which now can routinely reach petabyte scales. Coupled with data-hungry artificial intelligence and machine learning applications, findable, accessible, interoperable, and reusable genomics datasets have never been more valuable. While major archives like the Genomics Data Commons, Sequence Reads Archive, and European Genome-Phenome Archive have improved researchers' ability to share and reuse data, and general-purpose repositories such as Zenodo and Figshare provide valuable platforms for research data publication, the diversity of genomics research precludes any one-size-fits-all approach. In many cases, bespoke solutions are required, and despite funding agencies and journals increasingly mandating reusable data practices, researchers still lack the technical support needed to meet the multifaceted challenges of data reuse.Findings: Overture bridges this gap by providing open-source software for building and deploying customizable genomics data platforms. Its architecture consists of modular microservices, each of which is generalized with narrow responsibilities that together combine to create complete data management systems. These systems enable researchers to organize, share, and explore their genomics data at any scale. Through Overture, researchers can connect their data to both humans and machines, fostering reproducibility and enabling new insights through controlled data sharing and reuse.Conclusions: By making these tools freely available, we can accelerate the development of reliable genomic data management across the research community quickly, flexibly, and at multiple scales. Overture is an open-source project licensed under AGPLv3.0 with all source code publicly available from https://github.com/overture-stack and documentation on development, deployment, and usage available from www.overture.bio.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12020472/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143996787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Telomere-to-telomere genome assembly of Electrophorus electricus provides insights into the evolution of electric eels. 电鳗的端粒到端粒基因组组装提供了对电鳗进化的见解。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf024

Zan Qi, Qun Liu, Haorong Li, Yaolei Zhang, Ziwei Yu, Wenkai Luo, Kun Wang, Yuxin Zhang, Shoupeng Pan, Chao Wang, Hui Jiang, Qiang Qiu, Wen Wang, Guangyi Fan, Yongxin Li

{"title":"Telomere-to-telomere genome assembly of Electrophorus electricus provides insights into the evolution of electric eels.","authors":"Zan Qi, Qun Liu, Haorong Li, Yaolei Zhang, Ziwei Yu, Wenkai Luo, Kun Wang, Yuxin Zhang, Shoupeng Pan, Chao Wang, Hui Jiang, Qiang Qiu, Wen Wang, Guangyi Fan, Yongxin Li","doi":"10.1093/gigascience/giaf024","DOIUrl":"10.1093/gigascience/giaf024","url":null,"abstract":"Background: Electric eels evolved remarkable electric organs that enable them to instantaneously discharge hundreds of volts for predation, defense, and communication. However, the absence of a high-quality reference genome has extremely constrained the studies of electric eels in various aspects.Results: Using high-depth, multiplatform sequencing data, we successfully assembled the first telomere-to-telomere high-quality reference genome of Electrophorus electricus, which has a genome size of 833.43 Mb and comprises 26 chromosomes. Multiple evaluations, including N50 statistics (30.38 Mb), BUSCO scores (97.30%), and mapping ratio of short-insert sequencing data (99.91%), demonstrate the high contiguity and completeness of the electric eel genome assembly we obtained. Genome annotation predicted 396.63 Mb repetitive sequences and 20,992 protein-coding genes. Furthermore, evolutionary analyses indicate that Gymnotiformes, which the electric eel belongs to, has a closer relationship with Characiformes than Siluriformes and diverged from Characiformes 95.00 million years ago. Pairwise sequentially Markovian coalescent analysis found a sharply decreased trend of the population size of E. electricus over the past few hundred thousand years. Furthermore, many regulatory factors related to neurotransmitters and classical signaling pathways during embryonic development were significantly expanded, potentially contributing to the generation of high-voltage electricity.Conclusions: This study not only provided the first high-quality telomere-to-telomere reference genome of E. electricus but also greatly enhanced our understanding of electric eels.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11959694/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143752095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

New implementation of data standards for AI in oncology: Experience from the EuCanImage project. 肿瘤学人工智能数据标准的新实施：来自EuCanImage项目的经验。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae101

Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla

{"title":"New implementation of data standards for AI in oncology: Experience from the EuCanImage project.","authors":"Teresa García-Lezana, Maciej Bobowicz, Santiago Frid, Michael Rutherford, Mikel Recuero, Katrine Riklund, Aldar Cabrelles, Marlena Rygusik, Lauren Fromont, Roberto Francischello, Emanuele Neri, Salvador Capella, Arcadi Navarro, Fred Prior, Jonathan Bona, Pilar Nicolas, Martijn P A Starmans, Karim Lekadir, Jordi Rambla","doi":"10.1093/gigascience/giae101","DOIUrl":"10.1093/gigascience/giae101","url":null,"abstract":"Background: An unprecedented amount of personal health data, with the potential to revolutionize precision medicine, is generated at health care institutions worldwide. The exploitation of such data using artificial intelligence (AI) relies on the ability to combine heterogeneous, multicentric, multimodal, and multiparametric data, as well as thoughtful representation of knowledge and data availability. Despite these possibilities, significant methodological challenges and ethicolegal constraints still impede the real-world implementation of data models.Technical details: The EuCanImage is an international consortium aimed at developing AI algorithms for precision medicine in oncology and enabling secondary use of the data based on necessary ethical approvals. The use of well-defined clinical data standards to allow interoperability was a central element within the initiative. The consortium is focused on 3 different cancer types and addresses 7 unmet clinical needs. We have conceived and implemented an innovative process to capture clinical data from hospitals, transform it into the newly developed EuCanImage data models, and then store the standardized data in permanent repositories. This new workflow combines recognized software (REDCap for data capture), data standards (FHIR for data structuring), and an existing repository (EGA for permanent data storage and sharing), with newly developed custom tools for data transformation and quality control purposes (ETL pipeline, QC scripts) to complement the gaps.Conclusion: This article synthesizes our experience and procedures for health care data interoperability, standardization, and reproducibility.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12071370/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Best-practice guidance for Earth BioGenome Project sample collection and processing: progress and challenges in biodiverse reference genome creation. 地球生物基因组计划样本收集和处理的最佳实践指南：生物多样性参考基因组创建的进展和挑战。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf041

Mara K N Lawniczak, Kevin M Kocot, Jonas J Astrin, Mark Blaxter, Cibele G Sotero-Caio, Katharine B Barker, Anna K Childers, Jonathan Coddington, Paul Davis, Kerstin Howe, Warren E Johnson, Duane D McKenna, Jeremy G Wideman, Olga Vinnere Pettersson, Verena Ras, Bernardo F Santos

引用次数: 0

Spatial integration of multi-omics data from serial sections using the novel Multi-Omics Imaging Integration Toolset. 使用新颖的多组学成像集成工具集对来自连续切片的多组学数据进行空间集成。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf035

Maximilian Wess, Maria K Andersen, Elise Midtbust, Juan Carlos Cabellos Guillem, Trond Viset, Øystein Størkersen, Sebastian Krossa, Morten Beck Rye, May-Britt Tessem

{"title":"Spatial integration of multi-omics data from serial sections using the novel Multi-Omics Imaging Integration Toolset.","authors":"Maximilian Wess, Maria K Andersen, Elise Midtbust, Juan Carlos Cabellos Guillem, Trond Viset, Øystein Størkersen, Sebastian Krossa, Morten Beck Rye, May-Britt Tessem","doi":"10.1093/gigascience/giaf035","DOIUrl":"10.1093/gigascience/giaf035","url":null,"abstract":"Background: Truly understanding the cancer biology of heterogeneous tumors in precision medicine requires capturing the complexities of multiple omics levels and the spatial heterogeneity of cancer tissue. Techniques like mass spectrometry imaging (MSI) and spatial transcriptomics (ST) achieve this by spatially detecting metabolites and RNA but are often applied to serial sections. To fully leverage the advantage of such multi-omics data, the individual measurements need to be integrated into 1 dataset.Results: We present the Multi-Omics Imaging Integration Toolset (MIIT), a Python framework for integrating spatially resolved multi-omics data. A key component of MIIT's integration is the registration of serial sections for which we developed a nonrigid registration algorithm, GreedyFHist. We validated GreedyFHist on 244 images from fresh-frozen serial sections, achieving state-of-the-art performance. As a proof of concept, we used MIIT to integrate ST and MSI data from prostate tissue samples and assessed the correlation of a gene signature for citrate-spermine secretion derived from ST with metabolic measurements from MSI.Conclusion: MIIT is a highly accurate, customizable, open-source framework for integrating spatial omics technologies performed on different serial sections.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Health Data Nexus: an open data platform for AI research and education in medicine. Health Data Nexus：医学领域人工智能研究和教育的开放数据平台。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf050

January Adams, Rafal Cymerys, Karol Szuster, Daniel Hekman, Zoryana Salo, Rutvik Solanki, Muhammad Mamdani, Alistair Johnson, Katarzyna Ryniak, Tom Pollard, David Rotenberg, Benjamin Haibe-Kains

{"title":"Health Data Nexus: an open data platform for AI research and education in medicine.","authors":"January Adams, Rafal Cymerys, Karol Szuster, Daniel Hekman, Zoryana Salo, Rutvik Solanki, Muhammad Mamdani, Alistair Johnson, Katarzyna Ryniak, Tom Pollard, David Rotenberg, Benjamin Haibe-Kains","doi":"10.1093/gigascience/giaf050","DOIUrl":"10.1093/gigascience/giaf050","url":null,"abstract":"We outline the development of the Health Data Nexus, a data platform that enables data storage and access management with a cloud-based computational environment. We describe the importance of this secure platform in an evolving public-sector research landscape that utilizes significant quantities of data, particularly clinical data acquired from health systems, as well as the importance of providing meaningful benefits for three targeted user groups: data providers, researchers, and educators. We then describe the implementation of governance practices, technical standards, and data security, and the privacy protections needed to build this platform, as well as example use-cases highlighting the strengths of the platform in facilitating dataset acquisition, novel research, and hosting educational courses, workshops, and datathons. Finally, we discuss the key principles that informed the platform's development, highlighting the importance of flexible uses, collaborative development, and open-source science.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12131319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144208238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0