Bioinformatics advances最新文献_第9页

A fast method for extracting essential and synthetic lethality genes in GEM models. 一种快速提取GEM模型必需和合成致死基因的方法。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-06 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf127

Francisco Guil, José M García

引用次数: 0

Pharmacological assessment of Coffea arabica compounds as potential therapeutics for cervical cancer. 阿拉比卡咖啡化合物作为宫颈癌潜在治疗药物的药理学评价。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-05 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf132

Victor Omoboyede, Nwachukwu Christiana Okonkwo, Jimoh Olayemi Balogun, Onyekachi Victor Onyedikachi, Rita Ononiwu, Daniel Okpaise, Sarah Olanrewaju Oladejo, Christopher Busayo Olowosoke, Haruna Isiyaku Umar, Prosper Obed Chukwuemeka

{"title":"Pharmacological assessment of Coffea arabica compounds as potential therapeutics for cervical cancer.","authors":"Victor Omoboyede, Nwachukwu Christiana Okonkwo, Jimoh Olayemi Balogun, Onyekachi Victor Onyedikachi, Rita Ononiwu, Daniel Okpaise, Sarah Olanrewaju Oladejo, Christopher Busayo Olowosoke, Haruna Isiyaku Umar, Prosper Obed Chukwuemeka","doi":"10.1093/bioadv/vbaf132","DOIUrl":"10.1093/bioadv/vbaf132","url":null,"abstract":"Motivation: Cervical cancer remains a leading cause of gynecological mortality, with existing treatments often limited by resistance and suboptimal efficacy. While Coffea arabica is rich in phytochemicals with reported anticancer properties, their relevance to cervical cancer-specific molecular targets remains underexplored. Here, we integrated transcriptomic profiling, cheminformatics, and survival modeling to evaluate the therapeutic potential of C. arabica compounds in cervical cancer.Results: From 158 bioactive compounds with favorable pharmacokinetic and drug-likeness properties, we predicted gene targets and intersected them with 1779 differentially expressed genes identified from bulk RNA-sequencing of 304 cervical cancer tumors and 47 normal cervical tissues. This yielded 43 C. arabica gene targets that were significantly dysregulated in cervical cancer. Pathway enrichment revealed involvement in tumorigenesis, immune modulation, and cell cycle regulation, with fold enrichment computed as the ratio of observed-to-expected gene overlap. Survival analysis identified 14 of these genes as markers of poor prognosis, with matrix metalloproteinase-7 (MMP7) emerging as an independent prognostic marker of adverse outcome. A Random-Forest-Regression model trained on 499 experimentally validated MMP7 inhibitors identified carnosol-a C. arabica compound-as a top-ranking candidate with high predicted activity. These findings nominate carnosol as a promising therapeutic lead for cervical cancer and lay the groundwork for future experimental validation.Availability and implementation: The data supporting the findings of this study, including bulk RNA-seq gene expression data, survival, and phenotype data, are available through the TCGA database. These data can be accessed via the Xenabrowser platform (https://xenabrowser.net) using the reference identifier [TCGA Cervical Cancer (CESC)]. Corresponding healthy cervical tissue RNA-seq data, are available through the Genotype-Tissue Expression (GTEx) project (https://www.gtexportal.org/home/). The codes used for differential gene expression (DGE) analysis, pathway enrichment, and survival analysis, as well as scripts for generating volcano plots (DGE analysis), Kaplan-Meier survival plots, and boxplots (gene expression), and machine learning implementations are available on GitHub (https://github.com/Ponaskillzyy/Coffea_arabica_Potential_in_Cervical_Cancer).","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf132"},"PeriodicalIF":2.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12212767/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144546297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Harpy: a pipeline for processing haplotagging linked-read data. Harpy：处理单倍标记链接读数据的管道。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-05 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf133

Pavel V Dimens, Ryan P Franckowiak, Azwad Iqbal, Jennifer K Grenier, Paul R Munn, Nina Overgaard Therkildsen

引用次数: 0

SynDRep: a synergistic partner prediction tool based on knowledge graph for drug repurposing. SynDRep：基于知识图谱的药物再利用协同伙伴预测工具。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-05 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf092

Karim S Shalaby, Sathvik Guru Rao, Bruce Schultz, Martin Hofmann-Apitius, Alpha Tom Kodamullil, Vinay Srinivas Bharadhwaj

{"title":"SynDRep: a synergistic partner prediction tool based on knowledge graph for drug repurposing.","authors":"Karim S Shalaby, Sathvik Guru Rao, Bruce Schultz, Martin Hofmann-Apitius, Alpha Tom Kodamullil, Vinay Srinivas Bharadhwaj","doi":"10.1093/bioadv/vbaf092","DOIUrl":"10.1093/bioadv/vbaf092","url":null,"abstract":"Motivation: Drug repurposing is gaining interest due to its high cost-effectiveness, low risks, and improved patient outcomes. However, most drug repurposing methods depend on drug-disease-target semantic connections of a single drug rather than insights from drug combination data. In this study, we propose SynDRep, a novel drug repurposing tool based on enriching knowledge graphs (KG) with drug combination effects. It predicts the synergistic drug partner with a commonly prescribed drug for the target disease, leveraging graph embedding and machine learning (ML) techniques. This partner drug is then repurposed as a single agent for this disease by exploring pathways between them in the KG.Results: HolE was the best-performing embedding model (with 84.58% of true predictions for all relations), and random forest emerged as the best ML model with an area under the receiver operating characteristic curve (ROC-AUC) value of 0.796. Some of our selected candidates, such as miconazole and albendazole for Alzheimer's disease, have been validated through literature, while others lack either a clear pathway or literature evidence for their use for the disease of interest. Therefore, complementing SynDRep with more specialized KGs, and additional training data, would enhance its efficacy and offer cost-effective and timely solutions for patients.Availability and implementation: SynDRep is available as an open-source Python package at https://github.com/SynDRep/SynDRep under the Apache 2.0 License.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf092"},"PeriodicalIF":2.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12148216/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OReO: optimizing read order for practical compression. OReO：为实际压缩优化读顺序。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-03 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf128

Mathilde Girard, Léa Vandamme, Bastien Cazaux, Antoine Limasset

{"title":"OReO: optimizing read order for practical compression.","authors":"Mathilde Girard, Léa Vandamme, Bastien Cazaux, Antoine Limasset","doi":"10.1093/bioadv/vbaf128","DOIUrl":"10.1093/bioadv/vbaf128","url":null,"abstract":"Motivation: Recent advances in high-throughput and third-generation sequencing technologies have created significant challenges in storing and managing the rapidly growing volume of read datasets. Although more than 50 specialized compression tools have been developed, employing methods such as reference-based approaches, customized generic compressors, and read reordering, many users still rely on common generic compressors (e.g. gzip, zstd, xz) for convenience, portability, and reliability, despite their low compression ratios. Here, we introduce Optimizing Read Order (OReO), a simple read-reordering framework that achieves high compression performance without requiring specialized software for decompression. By grouping overlapping reads together before applying generic compressors, OReO exploits inherent redundancies in sequencing data and achieves compression ratios on par with state-of-the-art tools. Moreover, because it relies only on standard decompressors, OReO avoids the need for dedicated installations and maintenance, removing a key barrier to practical adoption.Results: We evaluated OReO on both Oxford Nanopore Technologies (ONT) and HiFi genomic and metagenomic datasets of varying sizes and complexities. Our results demonstrate that OReO provides substantial compression gains with comparable resource usage and outperforms dedicated methods in decompression speed. We propose that future compression strategies should focus on reordering as a means to let generic compression tools fully exploit data redundancy, offering an efficient, sustainable, and user-friendly solution to the growing challenges of sequencing data storage.Availability and implementation: The OReO code is open source and available at github.com/girunivlille/oreo.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf128"},"PeriodicalIF":2.4,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12185860/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144487289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic. 利用数百万个酵母启动子预测基因表达揭示了顺式调控逻辑。

IF 2.4

Bioinformatics advances Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf130

Tirtharaj Dash, Susanne Bornelöv

{"title":"Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic.","authors":"Tirtharaj Dash, Susanne Bornelöv","doi":"10.1093/bioadv/vbaf130","DOIUrl":"10.1093/bioadv/vbaf130","url":null,"abstract":"Motivation: Gene regulation involves complex interactions between transcription factors. While early attempts to predict gene expression were trained using naturally occurring promoters, gigantic parallel reporter assays have vastly expanded potential training data. Despite this, it is still unclear how to best use deep learning to study gene regulation. Here, we investigate the association between promoters and expression using Camformer, a residual convolutional neural network that ranked fourth in the Random Promoter DREAM Challenge 2022. We present the original model trained on 6.7 million sequences and investigate 270 alternative models to find determinants of model performance. Finally, we use explainable AI to uncover regulatory signals.Results: Camformer accurately decodes the association between promoters and gene expression ( <math> <mrow> <mrow> <msup><mrow><mi>r</mi></mrow> <mn>2</mn></msup> </mrow> <mo>=</mo> <mn>0.914</mn> <mo> ± </mo> <mn>0.003</mn></mrow> </math> , <math><mrow><mi>ρ</mi> <mo>=</mo> <mn>0.962</mn> <mo> ± </mo> <mn>0.002</mn></mrow> </math> ) and provides a substantial improvement over previous state of the art. Using Grad-CAM and in silico mutagenesis, we demonstrate that our model learns both individual motifs and their hierarchy. For example, while an IME1 motif on its own increases gene expression, a co-occurring UME6 motif instead strongly reduces gene expression. Thus, deep learning models such as Camformer can provide detailed insights into cis-regulatory logic.Availability and implementation: Data and code are available at: https://github.com/Bornelov-lab/Camformer.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf130"},"PeriodicalIF":2.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12188188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144499625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Causeway: a pipeline for genome-wide effector gene screening with Mendelian Randomization and colocalization. Causeway：用孟德尔随机化和共定位筛选全基因组效应基因的管道。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-29 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf110

Julia A de Amorim, João Vitor F Cavalcante, Diego Marques-Coelho, Rodrigo J S Dalmolin, Vasiliki Lagou

引用次数: 0

Using pseudotime derivative on single-cell RNA sequencing data to identify genes undergoing cell cycle regulation. 利用单细胞RNA测序数据的伪时间衍生物来鉴定细胞周期调控的基因。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-29 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf123

Yohan Lefol, Geir Amund Svan Hasle, Siv Anita Hegre, Helle Samdal, Pål Sætrom

{"title":"Using pseudotime derivative on single-cell RNA sequencing data to identify genes undergoing cell cycle regulation.","authors":"Yohan Lefol, Geir Amund Svan Hasle, Siv Anita Hegre, Helle Samdal, Pål Sætrom","doi":"10.1093/bioadv/vbaf123","DOIUrl":"10.1093/bioadv/vbaf123","url":null,"abstract":"Motivation: The cell cycle is a critical part of cellular life, one that has long been studied, both directly, and through its regulatory components. Commonly, cell cycle synchronization or selection experiments are performed in order to study the cell cycle, thus chemically modifying the cells, or selecting them for specific phases. We seek to develop a means to study the cell cycle through the use of single cell RNA sequencing, effectively circumventing the need for such experiments.Results: We utilize a well-established pseudotime method, along with the predicted and real expression of genes to calculate the velocity of individual genes. We then utilize statistics and expected biological behaviour to identify genes with significant shifts in velocity within the pseudotime. Additionally, we show the ability to observe gene regulatory behaviour such as mRNA splicing and degradation rates. As many cell line based research utilize multiple replicates we implement a merger method for technical replicates to adjust for technical variations, creating a more robust analysis. In summary, our study develops a robust approach to map the velocities of individual, biologically, and statistically significant genes throughout the cell cycle's phases within a cell line experiment.Availability and implementation: Data and code are available at: https://github.com/Ylefol/CC_vel.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf123"},"PeriodicalIF":2.4,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12255884/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144627875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ReSort enhances reference-based cell type deconvolution for spatial transcriptomics through regional information integration. ReSort通过区域信息整合增强了基于参考的细胞类型反褶积的空间转录组学。

IF 2.4

Bioinformatics advances Pub Date : 2025-05-27 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf091

Linhua Wang, Ling Wu, Guantong Qi, Chaozhong Liu, Wanli Wang, Xiang H-F Zhang, Zhandong Liu

引用次数: 0

Predictive machine learning model for 30-day hospital readmissions in a tertiary healthcare setting. 三级医疗机构30天再入院的预测机器学习模型

IF 2.4

Bioinformatics advances Pub Date : 2025-05-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf121

Diego Halac, Cecilia Cocucci, Sebastian Camerlingo

{"title":"Predictive machine learning model for 30-day hospital readmissions in a tertiary healthcare setting.","authors":"Diego Halac, Cecilia Cocucci, Sebastian Camerlingo","doi":"10.1093/bioadv/vbaf121","DOIUrl":"10.1093/bioadv/vbaf121","url":null,"abstract":"Motivation: Hospital readmissions represent a major challenge for healthcare systems due to their impact on patient outcomes and associated costs. As many readmissions are considered preventable, predictive modeling offers a valuable tool for early identification and intervention. This study aimed to develop and validate a predictive model for 30-day readmissions in a 200-bed community hospital in Argentina. A retrospective analysis was conducted on 3388 adult admissions. The primary endpoint was readmission within 30 days of discharge. Predictor variables included demographic and clinical factors such as age, length of stay, hypertension, diabetes, heart failure, coronary artery disease, stroke, cancer, dementia, chronic kidney disease, chronic obstructive pulmonary disease, and bedridden status. Three models-Logistic Regression (LR), Random Forest (RF), and LightGBM (LGBM)-were developed, with hyperparameter tuning via Bayesian optimization. Model performance was assessed using calibration, discrimination (C-statistics), and decision curve analysis. Internal validation was performed using 250 bootstrap resamples.Results: The readmission rate was 11% (n = 394). RF outperformed LR and LGBM in discrimination and clinical utility within predictive probability thresholds of 0.05-0.25. Optimism-corrected C-statistics were 0.60 (LR, LGBM) and 0.64 (RF); calibration slopes were 0.75 (LR), 1.13 (RF), and 1.76 (LGBM). Machine learning models, particularly RF, can improve readmission risk prediction and inform targeted healthcare interventions.Availability and implementation: The dataset and code used to develop and validate the predictive models are available from the corresponding author upon reasonable request. The implementation was conducted in R using the mlr3verse, pminternal, rms, dcurves, data.table, tidyverse, ranger and lightgbm packages, with Bayesian hyperparameter optimization via mlr3mbo.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf121"},"PeriodicalIF":2.4,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12158157/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144276913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0