Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
Pool PaRTI: A PageRank-Based Pooling Method for Identifying Critical Residues and Enhancing Protein Sequence Representations. 基于pagerank的池化方法识别关键残基并增强蛋白质序列表征。
Bioinformatics (Oxford, England) Pub Date : 2025-06-02 DOI: 10.1093/bioinformatics/btaf330
Alp Tartici, Gowri Nayar, Russ B Altman
{"title":"Pool PaRTI: A PageRank-Based Pooling Method for Identifying Critical Residues and Enhancing Protein Sequence Representations.","authors":"Alp Tartici, Gowri Nayar, Russ B Altman","doi":"10.1093/bioinformatics/btaf330","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf330","url":null,"abstract":"<p><strong>Motivation: </strong>Protein language models produce token-level embeddings for each residue, resulting in an output matrix with dimensions that vary based on sequence length. However, downstream machine learning models typically require fixed-length input vectors, necessitating a pooling method to compress the output matrix into a single vector representation of the entire protein. Traditional pooling methods often result in substantial information loss, impacting downstream task performance. We aim to develop a pooling method that produces more expressive general-purpose protein embedding vectors while offering biological interpretability.</p><p><strong>Results: </strong>We introduce Pool PaRTI, a novel pooling method that leverages internal transformer attention matrices and PageRank to assign token importance weights. Our unsupervised and parameter-free approach consistently prioritizes residues experimentally annotated as critical for function, assigning them higher importance scores. Across four diverse protein machine learning tasks, Pool PaRTI enables significant performance gains in predictive performance. Additionally, it enhances interpretability by identifying biologically relevant regions without relying on explicit structural data or annotated training. To assess generalizability, we evaluated Pool PaRTI with two encoder-only protein language models, confirming its robustness across different models.</p><p><strong>Availability and implementation: </strong>Pool PaRTI is implemented in Python with PyTorch and is available at github.com/Helix-Research-Lab/Pool_PaRTI.git.</p><p><strong>Contact and supplementary information: </strong>The Pool PaRTI sequence embeddings and residue importance values for all human proteins on UniProt are available at zenodo.org/records/15036725 for ESM2 and protBERT. You can contact the lead author for further questions.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144201045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DiSC: a Statistical Tool for Fast Differential Expression Analysis of Individual-level Single-cell RNA-seq Data. DiSC:用于个体水平单细胞RNA-seq数据快速差异表达分析的统计工具。
Bioinformatics (Oxford, England) Pub Date : 2025-05-30 DOI: 10.1093/bioinformatics/btaf327
Lujun Zhang, Lu Yang, Yingxue Ren, Shuwen Zhang, Weihua Guan, Jun Chen
{"title":"DiSC: a Statistical Tool for Fast Differential Expression Analysis of Individual-level Single-cell RNA-seq Data.","authors":"Lujun Zhang, Lu Yang, Yingxue Ren, Shuwen Zhang, Weihua Guan, Jun Chen","doi":"10.1093/bioinformatics/btaf327","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf327","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNA-seq) has become an important method for characterizing cellular heterogeneity, revealing more biological insights than the bulk RNA-seq. The surge in scRNA-seq data across multiple individuals calls for efficient and statistically powerful methods for differential expression (DE) analysis that addresses individual-level biological variability.</p><p><strong>Results: </strong>We introduced DiSC, a method for conducting individual-level DE analysis by extracting multiple distributional characteristics, jointly testing their association with a variable of interest, and using a flexible permutation testing framework to control the false discovery rate (FDR). Our simulation studies demonstrated that DiSC effectively controlled the FDR across various settings and exhibited high statistical power in detecting different types of gene expression changes. Moreover, DiSC is computationally efficient and scalable to the rapidly increasing sample sizes in scRNA-seq studies. When applying DiSC to identify DE genes potentially associated with COVID-19 severity and Alzheimer's disease across various types of peripheral blood mononuclear cells and neural cells, we found that our method was approximately 100 times faster than other state-of-the-art methods and the results were consistent and supported by existing literature. While DiSC was developed for scRNA-seq data, its robust testing framework can also be applied to other types of single-cell data. We applied DiSC to cytometry by time-of-flight data, DiSC identified significantly more DE markers than traditional methods.</p><p><strong>Availability: </strong>The R software package \"SingleCellStat\" is freely available on CRAN (https://cran.r-project.org/web/packages/SingleCellStat/index.html) and GitHub (https://github.com/Lujun995/DiSC). The replication code for reproducing the analyses in this study is publicly accessible at https://github.com/Lujun995/DiSC_Replication_Code.</p><p><strong>Supplementary information: </strong>The scRNA-seq expression matrix and metadata utilized in our simulations and analyses can be retrieved from https://cells.ucsc.edu/autism/rawMatrix.zip, https://cellxgene.cziscience.com/collections/1ca90a2d-2943-483d-b678-b809bf464c30, and https://covid19.cog.sanger.ac.uk/submissions/release1/haniffa21.processed.h5ad. Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144188675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Brownian motion data augmentation: a method to push neural network performance on nanopore sensors. 布朗运动数据增强:一种在纳米孔传感器上提升神经网络性能的方法。
Bioinformatics (Oxford, England) Pub Date : 2025-05-29 DOI: 10.1093/bioinformatics/btaf323
Javier Kipen, Joakim Jaldén
{"title":"Brownian motion data augmentation: a method to push neural network performance on nanopore sensors.","authors":"Javier Kipen, Joakim Jaldén","doi":"10.1093/bioinformatics/btaf323","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf323","url":null,"abstract":"<p><strong>Motivation: </strong>Nanopores are highly sensitive sensors that have achieved commercial success in DNA/RNA sequencing, with potential applications in protein sequencing and biomarker identification. Solid-state nanopores, in particular, face challenges such as instability and low signal-to-noise ratios (SNRs), which lead scientists to adopt data-driven methods for nanopore signal analysis, although data acquisition remains restrictive.</p><p><strong>Results: </strong>We address this data scarcity by augmenting the training samples with traces that emulate Brownian motion effects, based on dynamic models in the literature. We apply this method to a publicly available dataset of a classification task containing nanopore reads of DNA with encoded barcodes. A neural network named QuipuNet was previously published for this dataset, and we demonstrate that our augmentation method produces a noticeable increase in QuipuNet's accuracy. Furthermore, we introduce a novel neural network named YupanaNet, which achieves greater accuracy (95.8%) than QuipuNet (94.6%) on the same dataset. YupanaNet benefits from both the enhanced generalization provided by Brownian motion data augmentation and the incorporation of novel architectures, including skip connections and a soft attention mask.</p><p><strong>Availability and implementation: </strong>The source code and data are available at: https://github.com/JavierKipen/browDataAug.</p><p><strong>Supplementary information: </strong>Supplementary information is available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144174607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks. 从分子动力学轨迹中提取动态特征用于机器学习任务。
Bioinformatics (Oxford, England) Pub Date : 2025-05-29 DOI: 10.1093/bioinformatics/btaf321
Yang Zhang, Andreas Vitalis
{"title":"Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.","authors":"Yang Zhang, Andreas Vitalis","doi":"10.1093/bioinformatics/btaf321","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf321","url":null,"abstract":"<p><strong>Summary: </strong>Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.</p><p><strong>Availability and implementation: </strong>The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HSDSnake: a user-friendly SnakeMake pipeline for analysis of duplicate genes in eukaryotic genomes. HSDSnake:一个用户友好的SnakeMake管道,用于分析真核生物基因组中的重复基因。
Bioinformatics (Oxford, England) Pub Date : 2025-05-28 DOI: 10.1093/bioinformatics/btaf325
Xi Zhang, Yining Hu, David Roy Smith, Zhenyu Cheng, John M Archibald
{"title":"HSDSnake: a user-friendly SnakeMake pipeline for analysis of duplicate genes in eukaryotic genomes.","authors":"Xi Zhang, Yining Hu, David Roy Smith, Zhenyu Cheng, John M Archibald","doi":"10.1093/bioinformatics/btaf325","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf325","url":null,"abstract":"<p><strong>Summary: </strong>Gene duplication is a well-known driver of molecular evolution-it acts as a source of genetic novelty, thereby, providing the raw substrate for organismal adaption. However, detecting different types of gene duplicates and comparing them in sequence datasets can be difficult. Existing tools can identify and classify gene duplicates that have arisen by various processes, but have limitations; for example, some do not have a user-friendly workflow and can include many intermediate steps requiring manual adjustments of parameters and/or are not maintained for the benefit of research community members. Here, we have developed HSDSnake, a user-friendly SnakeMake pipeline that can detect and classify gene duplications into five categories: dispersed, proximal, tandem, transposed, and whole genome. It also curates and evaluates the highly similar gene duplicates (HSDs) in each gene duplication category with reliance on both sequence similarity and conserved domains. Lastly, the detected gene duplicates can be visualized within a KEGG functional pathway framework and the substitution rates (Ka, Ks, and their Ka/Ks ratio) can be analyzed for all the duplicate gene pairs. We demonstrate HSDSnake's capabilities by analyzing two referenced genomes directly downloaded from NCBI and provide detailed instructions for each step.</p><p><strong>Availability and implementation: </strong>The HSDSnake pipeline uses SnakeMake and Conda to run and install dependencies. The distribution version is available online at GitHub: https://github.com/zx0223winner/HSDSnake and the archived version at Zenodo is https://doi.org/10.5281/zenodo.15521945.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online and at https://github.com/zx0223winner/HSDSnake/blob/main/docs/Usage.md.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144174898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins. 残基保存和溶剂可及性(几乎)是预测蛋白质突变效应所需的全部。
Bioinformatics (Oxford, England) Pub Date : 2025-05-28 DOI: 10.1093/bioinformatics/btaf322
Matsvei Tsishyn, Pauline Hermans, Fabrizio Pucci, Marianne Rooman
{"title":"Residue conservation and solvent accessibility are (almost) all you need for predicting mutational effects in proteins.","authors":"Matsvei Tsishyn, Pauline Hermans, Fabrizio Pucci, Marianne Rooman","doi":"10.1093/bioinformatics/btaf322","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf322","url":null,"abstract":"<p><strong>Motivation: </strong>Predicting how mutations impact protein biophysical properties remains a significant challenge in computational biology. In recent years, numerous predictors, primarily deep learning models, have been developed to address this problem; however, issues such as their lack of interpretability and limited accuracy persist.</p><p><strong>Results: </strong>We showed that a simple evolutionary score, based on the log-odd ratio (LOR) of wild-type and mutated residue frequencies in evolutionary related proteins, when scaled by the residue's relative solvent accessibility (RSA), performs on par with or slightly outperforms most of the benchmarked predictors, many of which are considerably more complex. The evaluation is performed on mutations from the ProteinGym deep mutational scanning dataset collection, which measures various properties such as stability, activity or fitness. This raises further questions about what these complex models actually learn and highlights their limitations in addressing prediction of mutational landscape.</p><p><strong>Availability: </strong>The RSALOR model is available as a user-friendly Python package that can be installed from the PyPI repository. The code is freely available at https://github.com/3BioCompBio/RSALOR.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deep Learning-based Method for Predicting the Frequency Classes of Drug Side Effects Based on Multi-Source Similarity Fusion. 基于多源相似度融合的深度学习药物副作用频率分类预测方法。
Bioinformatics (Oxford, England) Pub Date : 2025-05-27 DOI: 10.1093/bioinformatics/btaf319
Haochen Zhao, Dingxi Li, Jian Zhong, Xiao Liang, Guihua Duan, Jianxin Wang
{"title":"A Deep Learning-based Method for Predicting the Frequency Classes of Drug Side Effects Based on Multi-Source Similarity Fusion.","authors":"Haochen Zhao, Dingxi Li, Jian Zhong, Xiao Liang, Guihua Duan, Jianxin Wang","doi":"10.1093/bioinformatics/btaf319","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf319","url":null,"abstract":"<p><strong>Motivation: </strong>Drug side effects refer to harmful or adverse reactions that occur during drug use, unrelated to the therapeutic purpose. A core issue in drug side effect prediction is determining the frequency of these drug side effects in the population, which can guide patient medication use and drug development. Many computational methods have been developed to predict the frequency of drug side effects as an alternative to clinical trials. However, existing methods typically build regression models on five frequency classes of drug side effects and tend to overfit the training set, leading to boundary handling issues and the risk of overfitting.</p><p><strong>Results: </strong>To address this problem, we develop a multi-source similarity fusion-based model, named MSSF, for predicting five frequency classes of drug side effects. Compared to existing methods, our model utilizes the multi-source feature fusion module and the self-attention mechanism to explore the relationships between drugs and side effects deeply and employs Bayesian variational inference to more accurately predict the frequency classes of drug side effects. The experimental results indicate that MSSF consistently achieves superior performance compared to existing models across multiple evaluation settings, including cross-validation, cold-start experiments, and independent testing. The visual analysis and case studies further demonstrate MSSF's reliable feature extraction capability and promise in predicting the frequency classes of drug side effects.</p><p><strong>Availability: </strong>The source code of MSSF is available on GitHub (https://github.com/dingxlcse/MSSF.git) and archived on Zenodo (DOI: 10.5281/zenodo.15462041).</p><p><strong>Supplementary information: </strong>Additional files are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144163296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scNucMap: mapping the nucleosome landscapes at single-cell resolution. scNucMap:以单细胞分辨率绘制核小体景观。
Bioinformatics (Oxford, England) Pub Date : 2025-05-27 DOI: 10.1093/bioinformatics/btaf324
Qianming Xiang, Binbin Lai
{"title":"scNucMap: mapping the nucleosome landscapes at single-cell resolution.","authors":"Qianming Xiang, Binbin Lai","doi":"10.1093/bioinformatics/btaf324","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf324","url":null,"abstract":"<p><strong>Motivation: </strong>Nucleosome depletion around cis-regulatory elements (CREs) is associated with CRE activity and implies the underlying gene regulatory network. Single-cell micrococcal nuclease sequencing (scMNase-seq) enables the simultaneous measurement of nucleosome positioning and chromatin accessibility at single-cell resolution, thereby capturing cellular heterogeneity in epigenetic regulation. However, there is currently no computational tool specifically designed to decode scMNase-seq data, impeding the generation of more precise and context-dependent insights into chromatin dynamics and gene regulation.</p><p><strong>Results: </strong>Here, we present scNucMap, a tool designed to leverage the unique characteristics of scMNase-seq data to map the landscapes of candidate nucleosome-free regions (NFRs). scNucMap demonstrated superior performance and robustness in cell clustering on scMNase-seq data compared to Signac and chromVAR across diverse sample compositions and data complexities, achieving higher overall accuracy and Kappa coefficients. Additionally, scNucMap identified significant TFs associated with nucleosome depletion at CREs at both single-cell and cell-cluster levels, thereby facilitating cell-type annotation and regulatory network inference. When applied to scATAC-seq, scNucMap enriched standard analyses with complementary insights into nucleosome architecture, underscoring its cross‑modality versatility. Overall, scNucMap exhibits both high reliability and adaptability, making it an effective tool for analyzing scMNase-seq data and supporting multimodal studies, thereby illuminating the intricate relationship between regulatory networks and nucleosome positioning at single-cell resolution.</p><p><strong>Availability and implementation: </strong>scNucMap is available at https://github.com/qianming-bioinfo/scNucMap.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144163853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations. 简单的控制超过了最好的深度学习算法,并揭示了预测遗传扰动的基础模型有效性。
Bioinformatics (Oxford, England) Pub Date : 2025-05-23 DOI: 10.1093/bioinformatics/btaf317
Daniel R Wong, Abby S Hill, Rob Moccia
{"title":"Simple controls exceed best deep learning algorithms and reveal foundation model effectiveness for predicting genetic perturbations.","authors":"Daniel R Wong, Abby S Hill, Rob Moccia","doi":"10.1093/bioinformatics/btaf317","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf317","url":null,"abstract":"<p><strong>Motivation: </strong>Modeling genetic perturbations and their effect on the transcriptome is a key area of pharmaceutical research. Due to the complexity of the transcriptome, there has been much excitement and development in deep learning (DL) because of its ability to model complex relationships. In particular, the transformer-based foundation model paradigm emerged as the gold-standard of predicting post-perturbation responses. However, understanding these increasingly complex models and evaluating their practical utility is lacking, along with simple but appropriate benchmarks to compare predictive methods.</p><p><strong>Results: </strong>Here, we present a simple baseline method that outperforms both state of the art (SOTA) in DL and other proposed simpler neural architectures, setting a necessary benchmark to evaluate in the field of post-perturbation prediction. We also elucidate the utility of foundation models for the task of post-perturbation prediction via generalizable fine-tuning experiments that can be translated to different applications of transformer-based foundation models to tasks of interest. Furthermore, we provide a corrected version of a popular dataset used for benchmarking perturbation prediction models. Our hope is that this work will properly contextualize further development of DL models in the perturbation space with necessary control procedures.</p><p><strong>Availability and implementation: </strong>All source code is available at: https://github.com/pfizer-opensource/perturb_seq. The DOI is 10.5281/zenodo.15352937.</p><p><strong>Contact: </strong>daniel.wong@pfizer.com.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TRENDY: Gene Regulatory Network Inference Enhanced by Transformer. 最新进展:基因调控网络推断被变压器增强。
Bioinformatics (Oxford, England) Pub Date : 2025-05-23 DOI: 10.1093/bioinformatics/btaf314
Xueying Tian, Yash Patel, Yue Wang
{"title":"TRENDY: Gene Regulatory Network Inference Enhanced by Transformer.","authors":"Xueying Tian, Yash Patel, Yue Wang","doi":"10.1093/bioinformatics/btaf314","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf314","url":null,"abstract":"<p><strong>Motivation: </strong>Gene regulatory networks (GRNs) play a crucial role in the control of cellular functions. Numerous methods have been developed to infer GRNs from gene expression data, including mechanism-based approaches, information-based approaches, and more recent deep learning techniques, the last of which often overlook the underlying gene expression mechanisms.</p><p><strong>Results: </strong>In this work, we introduce TRENDY, a novel GRN inference method that integrates transformer models to enhance the mechanism-based WENDY approach. Through testing on both simulated and experimental datasets, TRENDY demonstrates superior performance compared to existing methods. Furthermore, we apply this transformer-based approach to three additional inference methods, showcasing its broad potential to enhance GRN inference.</p><p><strong>Availability and implementation: </strong>Code and data files are available at https://github.com/YueWangMathbio/TRENDY, with DOI : 10.6084/m9.figshare.28236074.</p><p><strong>Supplementary information: </strong>Supplementary material is available at Bioinfomatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144132950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信