Bioinformatics advances最新文献_第2页

Opportunities and considerations for using artificial intelligence in bioinformatics education. 在生物信息学教育中使用人工智能的机会和考虑。

IF 2.8

Bioinformatics advances Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf169

Stephen R Piccolo, Aparna Nathan, Michelle D Brazas, Manoj Kandpal, Aida T Miró-Herrans, Adam J Kleinschmit, Susan McClatchy, Pertunia Mutheiwana, Dusanka Nikolic, Luciana I Gallo, Rolanda Sunaye Julius, Marta Lloret-Llinares, Nicola Mulder, Danielle Presgraves, Sonal Shewaramani, Jorge Xool-Tamayo, Frédéric J J Chain, Silvia Arantza Sanchez Guerrero

{"title":"Opportunities and considerations for using artificial intelligence in bioinformatics education.","authors":"Stephen R Piccolo, Aparna Nathan, Michelle D Brazas, Manoj Kandpal, Aida T Miró-Herrans, Adam J Kleinschmit, Susan McClatchy, Pertunia Mutheiwana, Dusanka Nikolic, Luciana I Gallo, Rolanda Sunaye Julius, Marta Lloret-Llinares, Nicola Mulder, Danielle Presgraves, Sonal Shewaramani, Jorge Xool-Tamayo, Frédéric J J Chain, Silvia Arantza Sanchez Guerrero","doi":"10.1093/bioadv/vbaf169","DOIUrl":"10.1093/bioadv/vbaf169","url":null,"abstract":"Artificial intelligence (AI) tools and techniques are undoubtedly being used in bioinformatics education, reflecting broader trends in education. However, many instructors and learners may be unaware of the full scope of potential uses for these tools within bioinformatics education, as well as effective practices for using them. Building on discussions held at the 6th Global Bioinformatics Education Summit, this perspective article provides insights about ways that AI might be used to generate or adapt instructional content, provide personalized help for learners, and automate assessment and grading. Additionally, we highlight AI skills that are important for bioinformatics learners to develop in order to effectively use AI as a bioinformatics learning tool. We highlight currently available tools in the quickly evolving AI landscape and suggest ways that instructors or learners might use such tools. Furthermore, we discuss key considerations and challenges associated with integrating AI into bioinformatics education, including ethical implications, potential biases, and the need to critically evaluate AI-generated content. Finally, we highlight the need for further research to better understand how AI tools are being used in practice and empower their effective and responsible use in bioinformatics education.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf169"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401575/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unifying DNA methylation-based in silico cell-type deconvolution with deconvMe. 统一基于DNA甲基化的硅细胞型反褶积与反褶积。

IF 2.8

Bioinformatics advances Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf201

Alexander Dietrich, Lina-Liv Willruth, Korbinian Pürckhauer, Carlos Oltmanns, Moana Witte, Sebastian Klein, Anke R M Kraft, Markus Cornberg, Markus List

引用次数: 0

A graph attention-based deep learning network for predicting biotech-small-molecule drug interactions. 用于预测生物技术-小分子药物相互作用的基于图注意力的深度学习网络。

IF 2.8

Bioinformatics advances Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf192

Fatemeh Nasiri, Mohsen Hooshmand

{"title":"A graph attention-based deep learning network for predicting biotech-small-molecule drug interactions.","authors":"Fatemeh Nasiri, Mohsen Hooshmand","doi":"10.1093/bioadv/vbaf192","DOIUrl":"10.1093/bioadv/vbaf192","url":null,"abstract":"Motivation: The increasing demand for effective drug combinations has made drug-drug interaction prediction a critical task in modern pharmacology. While most existing research focuses on small-molecule drugs, the role of biotech drugs in complex disease treatments remains relatively unexplored. Biotech drugs, derived from biological sources, have unique molecular structures that differ significantly from those of small molecules, making their interactions more challenging to predict.Results: This study introduces a novel graph attention network-based deep learning framework that improves interaction prediction between biotech and small-molecule drugs. Experimental results demonstrate that the proposed method outperforms existing methods in multiclass drug-drug interaction prediction, achieving superior performance across various evaluation types, including micro, macro, and weighted assessments. These findings highlight the potential of deep learning and graph-based models in uncovering novel interactions between biotech and small-molecule drugs, paving the way for more effective combination therapies in drug discovery.Availability and implementation: The datasets and source code of this study are available in the GitHub repository: https://github.com/BioinformaticsIASBS/BSI-Net.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf192"},"PeriodicalIF":2.8,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12408249/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GCompip: a pipeline for estimating the gene abundance in microbial communities. GCompip：一个估算微生物群落中基因丰度的管道。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-29 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf207

Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang

{"title":"GCompip: a pipeline for estimating the gene abundance in microbial communities.","authors":"Xiang Zhou, Qiushuang Li, Shizhe Zhang, Wenxing Wang, Rong Wang, Xiumin Zhang, Zhiliang Tan, Min Wang","doi":"10.1093/bioadv/vbaf207","DOIUrl":"10.1093/bioadv/vbaf207","url":null,"abstract":"Motivation: Gene abundance in metagenome datasets is commonly represented in terms of Counts or Copies Per Million. However, above term lack the consideration of the size of the microbial communities. To reflect the gene abundance in the microbial communities (GAM), GCompip, a comprehensive pipeline for estimating GAM, was developed based on specialized universal single copy genes (USCG) database, stringent alignment parameters, and rigorous filtering criteria.Results: GCompip showed high specificity without compromising computational efficiency, and improved the precision of downstream GAM estimations across diverse six ecological environments (i.e. human gut, rumen, freshwater, marine, hydrothermal sediment, and glacier). In contrast, the comparative annotation tools (i.e. KofamScan, eggNOG-mapper and HUMAnN3) showed larger error intervals, higher susceptibility to false positives, or overestimation of USCG abundance, primarily due to more relaxed thresholds, multifamily matches, or less stringent alignment settings. To facilitating the applicability of GCompip, we provided both Linux command line and R package versions. Overall, this GCompip presented an accurate, robust, user-friendly, and efficient computational pipeline designed to calculate GAM using metagenomic sequencing data. The developed pipeline makes it accessible to researchers seeking to evaluate the metabolic capabilities of microbial communities, and improve the capacity of interpreting metagenomic data related to microbial communities.Availability and implementation: GCompip package source code and documentation are freely available for download at https://github.com/XiangZhouCAS/GCompip. A separate Linux command line version is available at https://github.com/XiangZhouCAS/GCompip_onlinux.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf207"},"PeriodicalIF":2.8,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12460045/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145187607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IGD: a simple, efficient genotype data format. IGD：一种简单、高效的基因型数据格式。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-26 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf205

Drew DeHaas, Xinzhu Wei

引用次数: 0

NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings. NFEmbed：通过分类和回归与预训练的蛋白质包埋建模的氮酶活性。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-23 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf204

Md Muhaiminul Islam Nafi, Abdullah Al Mohaimin

{"title":"NFEmbed: modeling nitrogenase activity via classification and regression with pretrained protein embeddings.","authors":"Md Muhaiminul Islam Nafi, Abdullah Al Mohaimin","doi":"10.1093/bioadv/vbaf204","DOIUrl":"10.1093/bioadv/vbaf204","url":null,"abstract":"Motivation: Heavy usage of synthetic nitrogen fertilizers to satisfy the increasing demands for food has led to severe environmental impacts like decreasing crop yields and eutrophication. One promising alternative is using nitrogen-fixing microorganisms as biofertilizers, which use the nitrogenase enzyme. This could also be achieved by expressing a functional nitrogenase enzyme in the cells of the cereal crops.Results: In this study, we predicted microbial strains with a high potential for nitrogenase activity using machine learning techniques. Its objective was to enable the screening and ranking of potential strains based on genomic information. We explored several protein language model embeddings for this prediction task and built two stacking ensemble models. One of them, NFEmbed-C, used k-Nearest Neighbors and Random Forest as base and meta learners, respectively. The other one, NFEmbed-R, combined Decision Tree Regressor and eXtreme Gradient Boosting Regressor as base learners, with Support Vector Regressor as the meta learner. On the Test set, both NFEmbed-C and NFEmbed-R performed better than the state-of-the-art methods with improvements ranging from 0% to 11.2% and from 30% to 51%, respectively. While NFEmbed-R got a 0.783 R 2 score, 0.158 MSE, and 0.398 RMSE, NFEmbed-C acquired 0.949 sensitivity, 0.892 F1 score, and 0.784 Matthews Correlation Coefficient on the test set.Availability and implementation: We performed our analysis in Python; code is available at https://github.com/nafcoder/NFEmbed.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf204"},"PeriodicalIF":2.8,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417089/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An explainable machine learning pipeline for prediction of antimicrobial resistance in Pseudomonas aeruginosa. 预测铜绿假单胞菌抗菌素耐药性的可解释的机器学习管道。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf190

Aakriti Jain, Govinda Rao Dabburu, Bishal Samanta, Neelja Singhal, Manish Kumar

{"title":"An explainable machine learning pipeline for prediction of antimicrobial resistance in Pseudomonas aeruginosa.","authors":"Aakriti Jain, Govinda Rao Dabburu, Bishal Samanta, Neelja Singhal, Manish Kumar","doi":"10.1093/bioadv/vbaf190","DOIUrl":"10.1093/bioadv/vbaf190","url":null,"abstract":"Motivation: Prediction of antimicrobial resistance in Pseudomonas aeruginosa using machine learning and genomic sequences holds the potential to serve as comparable alternatives to laboratory based detection if not better. Additionally, model interpretability can further enhance the potential of these models paving way for their reproducibility.Results: We have developed a machine-learning based 2-tier pipeline to predict resistance phenotype in P. aeruginosa using only genomic sequences as input in the form of k-mers. Our Decision Tree Model yields an accuracy of 79% and area under the receiver operating characteristic curve of 0.77 with a 70% specificity and 84% sensitivity. We have interpreted the model's predictions using explainable AI as an attempt to bridge the gap between computational prediction and biological insight. Through these interpretations we have gathered antibiotic specific k-mer signatures pushing phenotype towards resistance.Availability and implementation: The curated dataset and related codes are available on request.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf190"},"PeriodicalIF":2.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12380447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144980693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings. ResLysEmbed：一个基于resnet的框架，用于使用序列和语言模型嵌入来预测琥珀酰化赖氨酸残基。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf198

Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman

{"title":"ResLysEmbed: a ResNet-based framework for succinylated lysine residue prediction using sequence and language model embeddings.","authors":"Souvik Ghosh, Md Muhaiminul Islam Nafi, M Saifur Rahman","doi":"10.1093/bioadv/vbaf198","DOIUrl":"10.1093/bioadv/vbaf198","url":null,"abstract":"Motivation: Lysine (K) succinylation is a crucial post-translational modification involved in cellular homeostasis and metabolism, and has been linked to several diseases in recent research. Despite its emerging importance, current computational methods are limited in performance for predicting succinylation sites.Results: We propose ResLysEmbed, a novel ResNet-based architecture that combines traditional word embeddings with per-residue embeddings from protein language models for succinylation site prediction. We also compared multiple protein language models to identify the most effective one for this task. Additionally, we experimented with several deep learning architectures to find the most suitable one for processing word embedding features and developed three hybrid architectures: ConvLysEmbed, InceptLysEmbed, and ResLysEmbed. Among these, ResLysEmbed achieved superior performance with accuracy, MCC, and F1 scores of 0.81, 0.39, 0.40 and 0.72, 0.44, 0.67 on two independent test sets, outperforming existing methods. Furthermore, we applied shapley additive explanations analysis to interpret the influence of each residue within the 33-length window around the target site on the model's predictions. This analysis helps understand how the sequential position and structural distance of residues from the target site affect their contribution to succinylation prediction.Availability: The implementation details and code are available at https://github.com/Sheldor7701/ResLysEmbed.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf198"},"PeriodicalIF":2.8,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12413228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145016687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Direct inference of haplotypes from sequencing data. 从测序数据直接推断单倍型。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf195

Zhen Zhang, Bencong Zhu, Yongyi Luo, Jiandong Shi, Sheng Lian, Jingyu Hao, Taobo Hu, Toyotaka Ishibashi, Depeng Wang, Shu Wang, Weichuan Yu, Xiaodan Fan

{"title":"Direct inference of haplotypes from sequencing data.","authors":"Zhen Zhang, Bencong Zhu, Yongyi Luo, Jiandong Shi, Sheng Lian, Jingyu Hao, Taobo Hu, Toyotaka Ishibashi, Depeng Wang, Shu Wang, Weichuan Yu, Xiaodan Fan","doi":"10.1093/bioadv/vbaf195","DOIUrl":"10.1093/bioadv/vbaf195","url":null,"abstract":"Motivation: Haplotypes are crucial for various genetic analyses, but reconstructing haplotypes from sequencing data remains a significant challenge. Current methods for haplotype reconstruction typically rely on a procedure of two separated stages, variant calling and phasing, but phasing overlooks the errors in variant calling. Additionally, the complexity of haplotype reconstruction increases with the number of homologous chromosomes in the sample, a common scenario in polyploid species or cell mixture sequencing.Results: To address the challenges above, we propose a unified probabilistic framework that directly utilizes sequencing reads to estimate haplotypes and sequencing error profiles. Rather than focusing solely on variant loci used by traditional phasing methods, our approach models all loci covered by any sequencing read to enhance the estimation of error profiles in sequencing data, thereby increasing the statistical power of haplotype inference, especially for low-coverage datasets. Evaluations on both simulated and real sequencing data demonstrate the superior performance of our method, particularly in scenarios characterized by high sequencing error rates, low coverage, or polyploidy.Availability and implementation: Related codes and dataset can be found at: https://github.com/new-zbc/DIHap.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf195"},"PeriodicalIF":2.8,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12448230/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DgeaHeatmap: an R package for transcriptomic analysis and heatmap generation. geaheatmap：一个R包转录组分析和热图生成。

IF 2.8

Bioinformatics advances Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf194

Leonie J Lancelle, Phani S Potru, Björn Spittau, Susanne Wiemann

{"title":"DgeaHeatmap: an R package for transcriptomic analysis and heatmap generation.","authors":"Leonie J Lancelle, Phani S Potru, Björn Spittau, Susanne Wiemann","doi":"10.1093/bioadv/vbaf194","DOIUrl":"10.1093/bioadv/vbaf194","url":null,"abstract":"Motivation: The growing use of transcriptomic data from platforms like Nanostring GeoMx DSP demands accessible and flexible tools for differential gene expression analysis and heatmap generation. Current web-based tools often lack transparency, modifiability, and independence from external servers creating barriers for researchers seeking customizable workflows, as well as data privacy and security. Additionally, tools that can be utilized by individuals with minimal bioinformatics expertise provide an inclusive solution, empowering a broader range of users to analyze complex data effectively.Results: Here, we introduce Differential Gene Expression Analysis and Heatmaps (DgeaHeatmap), an R package offering streamlined and user-friendly functions for transcriptomic data analysis particularly yielded by Nanostring GeoMx DSP instruments. The package supports both normalized and raw count data, providing tools to preprocess, filter, and annotate datasets. DgeaHeatmap leverages Z-score scaling and k-means clustering for customizable heatmap generation and incorporates a workflow adapted from GeoMxTools for handling raw Nanostring GeoMx DSP data. By enabling server-independent analyses, the package enhances flexibility, transparency, and reproducibility in transcriptomic research.Availability and implementation: The package DgeaHeatmap is freely available on GitLab (https://gitlab.ub.uni-bielefeld.de/spittaulab/Dgea_Heatmap_Package.git).","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf194"},"PeriodicalIF":2.8,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12401572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144994555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0