GigaScience最新文献_第7页

VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files. VCF2Dis：一个超快速高效的从VCF文件计算成对遗传距离和构建种群系统发育的工具。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf032

Lian Xu, Weiming He, Shuaishuai Tai, Xiaoli Huang, Mumu Qin, Xun Liao, Yi Jing, Jian Yang, Xiaodong Fang, Jianhua Shi, Nana Jin

{"title":"VCF2Dis: an ultra-fast and efficient tool to calculate pairwise genetic distance and construct population phylogeny from VCF files.","authors":"Lian Xu, Weiming He, Shuaishuai Tai, Xiaoli Huang, Mumu Qin, Xun Liao, Yi Jing, Jian Yang, Xiaodong Fang, Jianhua Shi, Nana Jin","doi":"10.1093/gigascience/giaf032","DOIUrl":"10.1093/gigascience/giaf032","url":null,"abstract":"Background: Genetic distance metrics are crucial for understanding the evolutionary relationships and population structure of organisms. Progress in next-generation sequencing technology has given rise of genotyping data of thousands of individuals. The standard Variant Call Format (VCF) is widely used to store genomic variation information, but calculating genetic distance and constructing population phylogeny directly from large VCF files can be challenging. Moreover, the existing tools that implement such functions remain limited and have low performance in processing large-scale genotype data, especially in the area of memory efficiency.Findings: To address these challenges, we introduce VCF2Dis, an ultra-fast and efficient tool that calculates pairwise genetic distance directly from large VCF files and then constructs distance-based population phylogeny using the ape package. Benchmarking results demonstrate the tool's efficiency, with rapid processing times, minimal memory usage (e.g., 0.37 GB for the complete analysis of 2,504 samples with 81.2 million variants), and high accuracy, even when handling datasets with millions of variants from thousands of individuals. Its straightforward command-line interface, compatibility with downstream phylogenetic analysis tools (e.g., MEGA, Phylip, and FastTree), and support for multithreading make it a valuable tool for researchers studying population relationships. These advantages meaning VCF2Dis has already been widely utilized in many published genomic studies.Conclusion: We present VCF2Dis, a straightforward and efficient tool for calculating genetic distance and constructing population phylogeny directly from large-scale genotype data. VCF2Dis has been widely applied, facilitating the exploration of population relationship in extensive genome sequencing studies.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11970368/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143784444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MGMA-PPIS: Predicting the protein-protein interaction site with multiview graph embedding and multiscale attention fusion. MGMA-PPIS：基于多视图图嵌入和多尺度注意力融合的蛋白相互作用位点预测。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf114

Yong Han, Shao-Wu Zhang, Qing-Qing Zhang, Ming-Hui Shi

{"title":"MGMA-PPIS: Predicting the protein-protein interaction site with multiview graph embedding and multiscale attention fusion.","authors":"Yong Han, Shao-Wu Zhang, Qing-Qing Zhang, Ming-Hui Shi","doi":"10.1093/gigascience/giaf114","DOIUrl":"10.1093/gigascience/giaf114","url":null,"abstract":"Background: Protein-protein interactions (PPIs) play a crucial role in numerous biological processes. Accurate identification of protein-protein interaction sites is critical for a comprehensive understanding of protein functions and pathological mechanisms. However, conventional experimental approaches for detecting PPIs are often time-consuming and labor-intensive, thereby motivating the development of efficient computational methods to identify PPI sites.Results: In this work, we propose a novel graph neural network-based method (called MGMA-PPIS) to predict PPI sites by adopting multiview graph embedding and multiscale attention fusion. MGMA-PPIS integrates global node features extracted by an equivariant graph neural network and multiscale local node features extracted by an edge graph attention network across different neighborhood scales, thereby constructing a multiview graph feature representation. Then, a multiscale attention network is employed to perform deep feature fusion across multiple scales for achieving high-precision prediction of PPI sites.Conclusions: Experimental results on benchmark datasets show that our MGMA-PPIS outperforms other state-of-the-art methods, and it can effectively predict PPI sites.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145199032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq. 银河作为生物信息学的门户：多界面银河实践培训套件（MIGHTS）用于scRNA-seq。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giae107

Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon

{"title":"Galaxy as a gateway to bioinformatics: Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for scRNA-seq.","authors":"Camila L Goclowski, Julia Jakiela, Tyler Collins, Saskia Hiltemann, Morgan Howells, Marisa Loach, Jonathan Manning, Pablo Moreno, Alex Ostrovsky, Helena Rasche, Mehmet Tekman, Graeme Tyson, Pavankumar Videm, Wendi Bacon","doi":"10.1093/gigascience/giae107","DOIUrl":"10.1093/gigascience/giae107","url":null,"abstract":"Background: Bioinformatics is fundamental to biomedical sciences, but its mastery presents a steep learning curve for bench biologists and clinicians. Learning to code while analyzing data is difficult. The curve may be flattened by separating these two aspects and providing intermediate steps for budding bioinformaticians. Single-cell analysis is in great demand from biologists and biomedical scientists, as evidenced by the proliferation of training events, materials, and collaborative global efforts like the Human Cell Atlas. However, iterative analyses lacking reinstantiation, coupled with unstandardized pipelines, have made effective single-cell training a moving target.Findings: To address these challenges, we present a Multi-Interface Galaxy Hands-on Training Suite (MIGHTS) for single-cell RNA sequencing (scRNA-seq) analysis, which offers parallel analytical methods using a graphical interface (buttons) or code. With clear, interoperable materials, MIGHTS facilitates smooth transitions between environments. Bridging the biologist-programmer gap, MIGHTS emphasizes interdisciplinary communication for effective learning at all levels. Real-world data analysis in MIGHTS promotes critical thinking and best practices, while FAIR data principles ensure validation of results. MIGHTS is freely available, hosted on the Galaxy Training Network, and leverages Galaxy interfaces for analyses in both settings. Given the ongoing popularity of Python-based (Scanpy) and R-based (Seurat & Monocle) scRNA-seq analyses, MIGHTS enables analyses using both.Conclusions: MIGHTS consists of 11 tutorials, including recordings, slide decks, and interactive visualizations, and a demonstrated track record of sustainability via regular updates and community collaborations. Parallel pathways in MIGHTS enable concurrent training of scientists at any programming level, addressing the heterogeneous needs of novice bioinformaticians.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11707610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142947515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multiomics analysis provides insights into musk secretion in muskrat and musk deer. 多组学分析为麝鼠和麝分泌麝香提供了新的思路。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf006

Tao Wang, Maosen Yang, Xin Shi, Shilin Tian, Yan Li, Wenqian Xie, Zhengting Zou, Dong Leng, Ming Zhang, Chengli Zheng, Chungang Feng, Bo Zeng, Xiaolan Fan, Huimin Qiu, Jing Li, Guijun Zhao, Zhengrong Yuan, Diyan Li, Hang Jie

{"title":"Multiomics analysis provides insights into musk secretion in muskrat and musk deer.","authors":"Tao Wang, Maosen Yang, Xin Shi, Shilin Tian, Yan Li, Wenqian Xie, Zhengting Zou, Dong Leng, Ming Zhang, Chengli Zheng, Chungang Feng, Bo Zeng, Xiaolan Fan, Huimin Qiu, Jing Li, Guijun Zhao, Zhengrong Yuan, Diyan Li, Hang Jie","doi":"10.1093/gigascience/giaf006","DOIUrl":"10.1093/gigascience/giaf006","url":null,"abstract":"Background: Musk, secreted by the musk gland of adult male musk-secreting mammals, holds significant pharmaceutical and cosmetic potential. However, understanding the molecular mechanisms of musk secretion remains limited, largely due to the lack of comprehensive multiomics analyses and available platforms for relevant species, such as muskrat (Ondatra zibethicus Linnaeus) and Chinese forest musk deer (Moschus berezovskii Flerov).Results: We generated chromosome-level genome assemblies for the 2 species of muskrat (Ondatra zibethicus Linnaeus) and musk deer (Moschus berezovskii Flerov), along with 168 transcriptomes from various muskrat tissues. Comparative analysis with 11 other vertebrate genomes revealed genes and amino acid sites with signs of adaptive convergent evolution, primarily linked to lipid metabolism, cell cycle regulation, protein binding, and immunity. Single-cell RNA sequencing in muskrat musk glands identified increased acinar/glandular epithelial cells during secretion, highlighting the role of lipometabolism in gland development and evolution. Additionally, we developed MuskDB (http://muskdb.cn/home/), a freely accessible multiomics database platform for musk-secreting mammals.Conclusions: The study concludes that the evolution of musk secretion in muskrats and musk deer is likely driven by lipid metabolism and cell specialization. This underscores the complexity of the musk gland and calls for further investigation into musk secretion-specific genetic variants.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11878540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143556460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dimensionality reduction for visualizing spatially resolved profiling data using SpaSNE. 使用SpaSNE可视化空间解析剖面数据的降维方法。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf002

Yuansheng Zhou, Chen Tang, Xue Xiao, Xiaowei Zhan, Tao Wang, Guanghua Xiao, Lin Xu

{"title":"Dimensionality reduction for visualizing spatially resolved profiling data using SpaSNE.","authors":"Yuansheng Zhou, Chen Tang, Xue Xiao, Xiaowei Zhan, Tao Wang, Guanghua Xiao, Lin Xu","doi":"10.1093/gigascience/giaf002","DOIUrl":"10.1093/gigascience/giaf002","url":null,"abstract":"Background: Spatially resolved profiling technologies to quantify transcriptomes, epigenomes, and proteomes have been emerging as groundbreaking methods for comprehensive molecular characterizations. Dimensionality reduction and visualization is an essential step to analyze and interpret spatially resolved profiling data. However, state-of-the-art dimensionality reduction methods for single-cell sequencing data, such as the t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), were not tailored for spatially resolved profiling data.Results: Here we developed a spatially resolved t-SNE (SpaSNE) method to integrate both spatial and molecular information. We applied it to a variety of public spatially resolved profiling datasets that were generated from 3 experimental platforms and consisted of cells from different diseases, tissues, and cell types. To compare the performances of SpaSNE, t-SNE, and UMAP, we applied them to 4 spatially resolved profiling datasets obtained from 3 distinct experimental platforms (Visium, STARmap, and MERFISH) on both diseased and normal tissues. Comparisons between SpaSNE and these state-of-the-art approaches reveal that SpaSNE achieves more accurate and meaningful visualization that better elucidates the underlying spatial and molecular data structures.Conclusions: This work demonstrates the broad application of SpaSNE for reliable and robust interpretation of cell types based on both molecular and spatial information, which can set the foundation for many subsequent analysis steps, such as differential gene expression and trajectory or pseudotime analysis on the spatially resolved profiling data.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11831803/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AEnet: a practical tool to construct the splicing-associated phenotype atlas at a single cell level. AEnet：在单细胞水平上构建剪接相关表型图谱的实用工具。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf110

Shang Liu, Xi Chen, Xiaohu Huang, Yuhang Wang, Waidong Huang, Pengfei Qin, Rui Li, Xuanxuan Zou, Wending Pang, Xiaoyun Huang, Shiping Liu, Yinqi Bai, Liang Wu

{"title":"AEnet: a practical tool to construct the splicing-associated phenotype atlas at a single cell level.","authors":"Shang Liu, Xi Chen, Xiaohu Huang, Yuhang Wang, Waidong Huang, Pengfei Qin, Rui Li, Xuanxuan Zou, Wending Pang, Xiaoyun Huang, Shiping Liu, Yinqi Bai, Liang Wu","doi":"10.1093/gigascience/giaf110","DOIUrl":"10.1093/gigascience/giaf110","url":null,"abstract":"Alternative splicing (AS), a crucial driver of proteomic diversity, is a fundamental source of cellular heterogeneity alongside gene expression levels. AS is closely linked to various physiological and pathological processes, including tumor progression and embryonic development. Single-cell RNA sequencing (scRNA-seq) technologies capture AS events through junction reads at cellular resolution, enabling the identification of core AS events that regulate specific cell types or states. However, single-cell sequencing technologies and their data are plagued by inherent limitations, such as shallow sequencing depth, high dropout rates, and batch effects. Furthermore, previous clustering approaches have overlooked the crucial interplay between AS and gene expression in defining distinct \"cell types,\" posing ongoing challenges in this field. In this study, we present a novel method called Alternative Splicing-Gene Expression Network (AEnet), which combines gene expression levels with AS patterns to profile cellular heterogeneity and define what we term \"cell subpopulations.\" AEnet also identifies key AS events and infers the regulatory mechanisms underlying these events. By applying AEnet to tumor cells, pan-cancer immune cells, and embryonic cells, we demonstrate enhanced cell clustering, the identification of novel AS events with potential functional importance, and the discovery of the key splicing factors involved in cell state transitions. The application of AEnet provides new insights into cellular heterogeneity and its role in both physiological and pathological processes.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12457822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145130559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating comparative genomics and risk classification by assessing virulence, antimicrobial resistance, and plasmid spread in microbial communities with gSpreadComp. 整合比较基因组学和风险分类，通过评估毒力、抗菌素耐药性和质粒在微生物群落中的传播。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf072

Jonas Coelho Kasmanas, Stefanía Magnúsdóttir, Junya Zhang, Kornelia Smalla, Michael Schloter, Peter F Stadler, André Carlos Ponce de Leon Ferreira de Carvalho, Ulisses Rocha

{"title":"Integrating comparative genomics and risk classification by assessing virulence, antimicrobial resistance, and plasmid spread in microbial communities with gSpreadComp.","authors":"Jonas Coelho Kasmanas, Stefanía Magnúsdóttir, Junya Zhang, Kornelia Smalla, Michael Schloter, Peter F Stadler, André Carlos Ponce de Leon Ferreira de Carvalho, Ulisses Rocha","doi":"10.1093/gigascience/giaf072","DOIUrl":"10.1093/gigascience/giaf072","url":null,"abstract":"Background: Comparative genomics, genetic spread analysis, and context-aware ranking are crucial in understanding microbial dynamics' impact on public health. gSpreadComp streamlines the path from in silico analysis to hypothesis generation. By integrating comparative genomics, genome annotation, normalization, plasmid-mediated gene transfer, and microbial resistance-virulence risk-ranking into a unified workflow, gSpreadComp facilitates hypothesis generation from complex microbial datasets.Findings: The gSpreadComp workflow works through 6 modular steps: taxonomy assignment, genome quality estimation, antimicrobial resistance (AMR) gene annotation, plasmid/chromosome classification, virulence factor annotation, and downstream analysis. Our workflow calculates gene spread using normalized weighted average prevalence and ranks potential resistance-virulence risk by integrating microbial resistance, virulence, and plasmid transmissibility data and producing an HTML report. As a use case, we analyzed 3,566 metagenome-assembled genomes recovered from human gut microbiomes across diets. Our findings indicated consistent AMR across diets, with diet-specific resistance patterns, such as increased bacitracin in vegans and tetracycline in omnivores. Notably, ketogenic diets showed a slightly higher resistance-virulence rank, while vegan and vegetarian diets encompassed more plasmid-mediated gene transfer.Conclusions: The gSpreadComp workflow aims to facilitate hypothesis generation for targeted experimental validations by the identification of concerning resistant hotspots in complex microbial datasets. Our study raises attention to a more thorough study of the critical role of diet in microbial community dynamics and the spread of AMR. This research underscores the importance of integrating genomic data into public health strategies to combat AMR. The gSpreadComp workflow is available at https://github.com/mdsufz/gSpreadComp/.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144505456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Correction to: The telomere-to-telomere gapless genome of grass carp provides insights for genetic improvement. 更正：草鱼的端粒到端粒无间隙基因组为遗传改良提供了见解。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf088

引用次数: 0

Comparing linear and nonlinear finite element models of vertebral strength across the thoracolumbar spine: a benchmark from density-calibrated computed tomography. 比较胸腰椎椎体强度的线性和非线性有限元模型：来自密度校准的计算机断层扫描的基准。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf094

Matthias Walle, Bryn E Matheson, Steven K Boyd

{"title":"Comparing linear and nonlinear finite element models of vertebral strength across the thoracolumbar spine: a benchmark from density-calibrated computed tomography.","authors":"Matthias Walle, Bryn E Matheson, Steven K Boyd","doi":"10.1093/gigascience/giaf094","DOIUrl":"https://doi.org/10.1093/gigascience/giaf094","url":null,"abstract":"Background: Opportunistic assessment of vertebral strength from clinical computed tomography (CT) scans holds substantial promise for fracture risk stratification, yet variability in calibration methods and finite element (FE) modeling approaches has led to limited comparability across studies. In this work, we provide a publicly available benchmark dataset that supports standardized biomechanical analysis of the thoracic and lumbar spine using density-calibrated CT data. We extended the VerSe 2019 dataset to include phantomless quantitative CT calibration, automated vertebral substructure segmentation, and vertebral strength estimates derived from both linear and nonlinear FE models. The cohort comprises 141 patients scanned across 5 CT systems, including contrast-enhanced protocols.Results: Phantomless calibration was performed using automatically segmented tissue references and validated against synchronous calibration phantoms in 17 scans. To evaluate model performance, we implemented a nonlinear elastoplastic FE model and compared it to 2 linear estimates. A displacement-calibrated linear model (0.2% axial strain) demonstrated excellent agreement with nonlinear failure loads (R = 0.96; mean difference = -0.07 kN), while a stiffness-based approach showed similarly strong correlation (R = 0.92). We evaluated vertebral strength at all thoracic and lumbar levels, enabling level-wise normalization and comparison. Strength ratios revealed consistent anatomical trends and identified T12 and T9 as reliable alternatives to L1 for opportunistic screening and model standardization.Conclusions: All calibrated scans, segmentations, software, and modeling outputs are publicly released, providing a benchmark resource for validation and development of FE models, radiomics tools, and other quantitative imaging applications in musculoskeletal research.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395960/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning. WaveSeekerNet：使用基于注意力的深度学习准确预测甲型流感病毒亚型和宿主来源。

IF 11.8 2区生物学

GigaScience Pub Date : 2025-01-06 DOI: 10.1093/gigascience/giaf089

Hoang-Hai Nguyen, Josip Rudar, Nathaniel Lesperance, Oksana Vernygora, Graham W Taylor, Chad Laing, David Lapen, Carson K Leung, Oliver Lung

{"title":"WaveSeekerNet: accurate prediction of influenza A virus subtypes and host source using attention-based deep learning.","authors":"Hoang-Hai Nguyen, Josip Rudar, Nathaniel Lesperance, Oksana Vernygora, Graham W Taylor, Chad Laing, David Lapen, Carson K Leung, Oliver Lung","doi":"10.1093/gigascience/giaf089","DOIUrl":"https://doi.org/10.1093/gigascience/giaf089","url":null,"abstract":"Background: Influenza A virus (IAV) poses a significant threat to animal health globally, with its ability to overcome species barriers and cause pandemics. Rapid and accurate IAV subtypes and host source prediction is crucial for effective surveillance and pandemic preparedness. Deep learning has emerged as a powerful tool for analyzing viral genomic sequences, offering new ways to uncover hidden patterns associated with viral characteristics and host adaptation.Findings: We introduce WaveSeekerNet, a novel deep learning model for accurate and rapid prediction of IAV subtypes and host source. The model leverages attention-based mechanisms and efficient token mixing schemes, including the Fourier Transform and the Wavelet Transform, to capture intricate patterns within viral RNA and protein sequences. Extensive experiments on diverse datasets demonstrate WaveSeekerNet's superior performance to existing models that use the traditional self-attention mechanism. Notably, WaveSeekerNet rivals VADR (Viral Annotation DefineR) in subtype prediction using the high-quality RNA sequences, achieving the maximum score of 1.0 on metrics, including the Balanced Accuracy, F1-score (Macro Average), and Matthews Correlation Coefficient. Our approach to subtype and host source prediction also exceeds the pretrained ESM-2 (Evolutionary Scale Modeling) models with respect to generalization performance and computational cost. Furthermore, WaveSeekerNet exhibits remarkable accuracy in distinguishing between human, avian, and other mammalian hosts. The ability of WaveSeekerNet to flag potential cross-species transmission events underscores its significant value for real-time surveillance and proactive pandemic preparedness efforts.Conclusions: WaveSeekerNet's superior performance, efficiency, and ability to flag potential cross-species transmission events highlight its potential for real-time surveillance and pandemic preparedness. This model represents a significant advancement in applying deep learning for IAV classification and holds promise for future epidemiological, veterinary studies, and public health interventions.","PeriodicalId":12581,"journal":{"name":"GigaScience","volume":"14 ","pages":""},"PeriodicalIF":11.8,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144950598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0