Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
OLTA: Optimizing bait seLection for TArgeted sequencing. OLTA:为定向测序优化饵料检测。
Bioinformatics (Oxford, England) Pub Date : 2025-04-02 DOI: 10.1093/bioinformatics/btaf146
Mete Orhun Minbay, Richard Sun, Vijay Ramachandran, Ahmet Ay, Tamer Kahveci
{"title":"OLTA: Optimizing bait seLection for TArgeted sequencing.","authors":"Mete Orhun Minbay, Richard Sun, Vijay Ramachandran, Ahmet Ay, Tamer Kahveci","doi":"10.1093/bioinformatics/btaf146","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf146","url":null,"abstract":"<p><strong>Motivation: </strong>Targeted enrichment via capture probes, also known as baits, is a promising complementary procedure for next-generation sequencing methods. This technique employs short biotinylated oligonucleotide probes that hybridize with complementary genetic material in a sample. Following hybridization, the target fragments can be easily isolated and processed with the minimal contamination from irrelevant material. Designing an efficient set of baits for a set of target sequences, however, is an NP-hard problem.</p><p><strong>Results: </strong>We develop a novel heuristic algorithm that leverages the similarities between the characteristics of the Minimum Bait Cover and the Closest String problems to reduce the number of baits to cover a given target sequence. Our results on real and synthetic datasets demonstrate that our algorithm, OLTA produces fewest baits for nearly all experimental settings and datasets. On average, it produces 6 and 11% fewer baits than the next best state of the art methods for two major real datasets, AIV and MEGARES. Also, its bait set has the highest utilization and the minimum redundancy.</p><p><strong>Availability: </strong>Our algorithm is available at github.com/FuelTheBurn/OLTA-Optimizing-bait-seLection-for-TArgeted-sequencing. Test data and other software are archived at doi.org/10.5281/zenodo.15086636.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization.
Bioinformatics (Oxford, England) Pub Date : 2025-04-02 DOI: 10.1093/bioinformatics/btaf149
Haiyang Bian, Yinxin Chen, Lei Wei, Xuegong Zhang
{"title":"uHAF: a unified hierarchical annotation framework for cell type standardization and harmonization.","authors":"Haiyang Bian, Yinxin Chen, Lei Wei, Xuegong Zhang","doi":"10.1093/bioinformatics/btaf149","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf149","url":null,"abstract":"<p><strong>Summary: </strong>In single-cell transcriptomics, inconsistent cell type annotations due to varied naming conventions and hierarchical granularity impede data integration, machine learning applications, and meaningful evaluations. To address this challenge, we developed the unified Hierarchical Annotation Framework (uHAF), which includes organ-specific hierarchical cell type trees (uHAF-T) and a mapping tool (uHAF-Agent) based on large language models. uHAF-T provides standardized hierarchical references for 38 organs, allowing for consistent label unification and analysis at different levels of granularity. uHAF-Agent leverages GPT-4 to accurately map diverse and informal cell type labels onto uHAF-T nodes, streamlining the harmonization process. By simplifying label unification, uHAF enhances data integration, supports machine learning applications, and enables biologically meaningful evaluations of annotation methods. Our framework serves as an essential resource for standardizing cell type annotations and fostering collaborative refinement in the single-cell research community.</p><p><strong>Availability and implementation: </strong>uHAF is publicly available at: https://uhaf.unifiedcellatlas.org and https://github.com/SuperBianC/uhaf.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143766079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XPRS: A Tool for Interpretable and Explainable Polygenic Risk Score.
Bioinformatics (Oxford, England) Pub Date : 2025-03-31 DOI: 10.1093/bioinformatics/btaf143
Na Yeon Kim, Seunggeun Lee
{"title":"XPRS: A Tool for Interpretable and Explainable Polygenic Risk Score.","authors":"Na Yeon Kim, Seunggeun Lee","doi":"10.1093/bioinformatics/btaf143","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf143","url":null,"abstract":"<p><strong>Summary: </strong>The polygenic risk score (PRS) is an important method for assessing genetic susceptibility to diseases; however, its clinical utility is limited by a lack of interpretability tools. To address this problem, we introduce eXplainable PRS (XPRS), an interpretation and visualization tool that decomposes PRSs into genes/regions and single nucleotide polymorphism (SNP) contribution scores via Shapley additive explanations (SHAPs), which provide insights into specific genes and SNPs that significantly contribute to the PRS of an individual. This software features a multilevel visualization approach, including Manhattan plots, LocusZoom-like plots and tables at the population and individual levels, to highlight important genes and SNPs. By implementing with a user-friendly web interface, XPRS allows for straightforward data input and interpretation. By bridging the gap between complex genetic data and actionable clinical insights, XPRS can improve communication between clinicians and patients.</p><p><strong>Availability and implementation: </strong>The XPRS software is publicly available on GitHub at https://github.com/nayeonkim93/XPRS and can see the demo through our cloud-based web service at https://xprs.leelabsg.org/.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HISSTA: a human in situ single-cell transcriptome atlas.
Bioinformatics (Oxford, England) Pub Date : 2025-03-31 DOI: 10.1093/bioinformatics/btaf142
Jiwon Yu, Jiwoo Moon, Minseo Kim, Gyeol Han, Insu Jang, Jinyoung Lim, Seungmook Lee, Seok-Hwan Yoon, Woong-Yang Park, Byungwook Lee, Sanghyuk Lee
{"title":"HISSTA: a human in situ single-cell transcriptome atlas.","authors":"Jiwon Yu, Jiwoo Moon, Minseo Kim, Gyeol Han, Insu Jang, Jinyoung Lim, Seungmook Lee, Seok-Hwan Yoon, Woong-Yang Park, Byungwook Lee, Sanghyuk Lee","doi":"10.1093/bioinformatics/btaf142","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf142","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics holds great promise for revolutionizing biology and medicine by providing gene expression profiles with spatial information. Until recently, spatial resolution has been limited, but advances in high-throughput in situ imaging technologies now offer new opportunities by covering thousands of genes at a single-cell or even subcellular resolution, necessitating databases dedicated to comprehensive coverage and analysis with user-friendly intefaces.</p><p><strong>Results: </strong>We introduce the HISSTA database, which facilitates the archival and analysis of in situ transcriptome data at single-cell resolution from various human tissues. We have collected and annotated spatial transcriptome data generated by MERFISH, CosMx SMI, and Xenium techniques, encompassing 112 samples and 28 million cells across 16 tissue types from 63 studies. To decipher spatial contexts, we have implemented advanced tools for cell type annotation, spatial colocalization, spatial cellular communication, and niche analyses. Notably, all datasets and annotations are interactively accessible through Vitessce, allowing users to focus on regions of interest and examine gene expression in detail. HISSTA is a unique database designed to manage the rapidly growing dataset of in situ transcriptomes at single-cell resolution. Given its comprehensive data content and advanced analysis tools with interactive visualizations, HISSTA is poised to significantly impact cancer diagnosis, precision medicine, and digital pathology.</p><p><strong>Availability and implementation: </strong>HISSTA is freely accessible at https://kbds.re.kr/hissta/. The source code is available at https://doi.org/10.5281/zenodo.14904523.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marker selection strategies for circulating tumor DNA guided by phylogenetic inference.
Bioinformatics (Oxford, England) Pub Date : 2025-03-31 DOI: 10.1093/bioinformatics/btaf145
Xuecong Fu, Zhicheng Luo, Yueqian Deng, William LaFramboise, David Bartlett, Russell Schwartz
{"title":"Marker selection strategies for circulating tumor DNA guided by phylogenetic inference.","authors":"Xuecong Fu, Zhicheng Luo, Yueqian Deng, William LaFramboise, David Bartlett, Russell Schwartz","doi":"10.1093/bioinformatics/btaf145","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf145","url":null,"abstract":"<p><strong>Motivation: </strong>Blood-based profiling of tumor DNA (\"liquid biopsy\") offers great prospects for non-invasive early cancer diagnosis and clinical guidance, but requires further computational advances to become a robust quantitative assay of tumor clonal evolution. We propose new methods to better characterize tumor clonal dynamics from circulating tumor DNA (ctDNA), through application to two specific tasks: 1) applying longitudinal ctDNA data to refine phylogeny models of clonal evolution, and 2) quantifying changes in clonal frequencies that may be indicative of treatment response or tumor progression. We pose these through a probabilistic framework for optimally identifying markers and using them to characterize clonal evolution.</p><p><strong>Results: </strong>We first estimate a density over clonal tree models using bootstrap samples over pre-treatment tissue-based sequence data. We then refine these models over successive longitudinal samples. We use the resulting framework for modeling and refining tree densities to pose a set of optimization problems for selecting ctDNA markers to maximize measures of utility for reducing uncertainty in phylogeny models and quantifying clonal frequencies given the models. We tested our methods on synthetic data and showed them to be effective at refining tree densities and inferring clonal frequencies. Application to real tumor data further demonstrated the methods' effectiveness in refining a lineage model and assessing its clonal frequencies. The work shows the power of computational methods to improve marker selection, clonal lineage reconstruction, and clonal dynamics profiling for more precise and quantitative assays of somatic evolution and tumor progression.</p><p><strong>Availability: </strong>https://github.com/CMUSchwartzLab/Mase-phi.git. (DOI: 10.5281/zenodo.14776163).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seed2LP: seed inference in metabolic networks for reverse ecology applications.
Bioinformatics (Oxford, England) Pub Date : 2025-03-31 DOI: 10.1093/bioinformatics/btaf140
Chabname Ghassemi Nedjad, Mathieu Bolteau, Lucas Bourneuf, Loïc Paulevé, Clémence Frioux
{"title":"Seed2LP: seed inference in metabolic networks for reverse ecology applications.","authors":"Chabname Ghassemi Nedjad, Mathieu Bolteau, Lucas Bourneuf, Loïc Paulevé, Clémence Frioux","doi":"10.1093/bioinformatics/btaf140","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf140","url":null,"abstract":"<p><strong>Motivation: </strong>A challenging problem in microbiology is to determine nutritional requirements of microorganisms and culture them, especially for the microbial dark matter detected solely with culture-independent methods. The latter foster an increasing amount of genomic sequences that can be explored with reverse ecology approaches to raise hypotheses on the corresponding populations. Building upon genome scale metabolic networks (GSMNs) obtained from genome annotations, metabolic models predict contextualised phenotypes using nutrient information.</p><p><strong>Results: </strong>We developed the tool Seed2LP, addressing the inverse problem of predicting source nutrients, or seeds, from a GSMN and a metabolic objective. The originality of Seed2LP is its hybrid model, combining a scalable and discrete Boolean approximation of metabolic activity, with the numerically accurate flux balance analysis (FBA). Seed inference is highly customisable, with multiple search and solving modes, exploring the search space of external and internal metabolites combinations. Application to a benchmark of 107 curated GSMNs highlights the usefulness of a logic modelling method over a graph-based approach to predict seeds, and the relevance of hybrid solving to satisfy FBA constraints. Focusing on the dependency between metabolism and environment, Seed2LP is a computational support contributing to address the multifactorial challenge of culturing possibly uncultured microorganisms.</p><p><strong>Availability: </strong>Seed2LP is available on https://github.com/bioasp/seed2lp.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CancerTrialMatch: a computational resource for the management of biomarker-based clinical trials at a community cancer center.
Bioinformatics (Oxford, England) Pub Date : 2025-03-31 DOI: 10.1093/bioinformatics/btaf144
Padmapriya Swaminathan, Anu Amallraja, Shivani Kapadia, Casey B Williams, Tobias Meißner
{"title":"CancerTrialMatch: a computational resource for the management of biomarker-based clinical trials at a community cancer center.","authors":"Padmapriya Swaminathan, Anu Amallraja, Shivani Kapadia, Casey B Williams, Tobias Meißner","doi":"10.1093/bioinformatics/btaf144","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf144","url":null,"abstract":"<p><strong>Motivation: </strong>The widespread implementation of next-generation sequencing in cancer care has enabled routine use of molecular and biomarker profiling. At our cancer center, as with many others, biomarker-based clinical trials are increasingly available to oncologists as potential treatment options via molecular tumor boards. To better support this effort, we developed CancerTrialMatch, a systematic approach to capture structured clinical trial data and match patients to trials based on their disease characteristics and sequencing profiles.</p><p><strong>Results: </strong>CancerTrialMatch is an open-source application designed to streamline clinical trial curation and patient trial matching, while also enabling an institution's curated trial portfolio to be distributed across the institution for easy access to providers, care teams and researchers. It facilitates curating, updating, and searching for trials through a semi-automated interface built using R Shiny, MongoDB, and Docker. While much of the trial data is retrieved via the clinicaltrials.gov Application Programming Interface (API), certain items like biomarkers and disease subtypes are entered manually. The user inputs disease type using the OncoTree classification, and provides relevant biomarker details, such as mutations, copy numbers, fusions, and other disease-specific markers. This resource reduces the time required for institutional trial management and helps to identify potential clinical trials for patients, ultimately supporting larger clinical trial enrollment and enhancing the clinical application of precision oncology.</p><p><strong>Availability and implementation: </strong>CancerTrialMatch was implemented and tested on Windows 11 (64-bit, 32 GB RAM) using WSL2 with Ubuntu 22.04. Docker 27.0.3 and Docker Compose 2.28.1 were used to build images and containers. Users can build it by cloning the repo and following the README instructions. The source code and example data are available in GitHub and Figshare at https://github.com/AveraSD/CancerTrialMatch and 10.6084/m9.figshare.28447367 respectively.</p><p><strong>Supplementary information: </strong>Instructions on how to build docker images are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exact model-free function inference using uniform marginal counts for null population. 使用无效人口的均匀边际计数进行精确的无模型函数推断。
Bioinformatics (Oxford, England) Pub Date : 2025-03-29 DOI: 10.1093/bioinformatics/btaf121
Yiyi Li, Mingzhou Song
{"title":"Exact model-free function inference using uniform marginal counts for null population.","authors":"Yiyi Li, Mingzhou Song","doi":"10.1093/bioinformatics/btaf121","DOIUrl":"10.1093/bioinformatics/btaf121","url":null,"abstract":"<p><strong>Motivation: </strong>Recognizing cause-effect relationships is a fundamental inquiry in science. However, current causal inference methods often focus on directionality but not statistical significance. A ramification is chance patterns of uneven marginal distributions achieving a perfect directionality score.</p><p><strong>Results: </strong>To overcome such issues, we design the uniform exact function test with continuity correction (UEFTC) to detect functional dependency between two discrete random variables. The null hypothesis is two variables being statistically independent. Unique from related tests whose null populations use observed marginals, we define the null population by an embedded uniform square. We also present a fast algorithm to accomplish the test. On datasets with ground truth, the UEFTC exhibits accurate directionality, low biases, and robust statistical behavior over alternatives. We found nonmonotonic response by gene TCB2 to beta-estradiol dosage in engineered yeast strains. In the human duodenum with environmental enteric dysfunction, we discovered pathology-dependent anti-co-methylated CpG sites in the vicinity of genes POU2AF1 and LSP1; such activity represents orchestrated methylation and demethylation along the same gene, unreported previously. The UEFTC has much improved effectiveness in exact model-free function inference for data-driven knowledge discovery.</p><p><strong>Availability and implementation: </strong>An open-source R package \"UniExactFunTest\" implementing the presented uniform exact function tests is available via CRAN at doi: 10.32614/CRAN.package.UniExactFunTest. Computer code to reproduce figures can be found in supplementary file \"UEFTC-main.zip.\"</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143671940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering individuals using INMTD: a novel versatile multi-view embedding framework integrating omics and imaging data.
Bioinformatics (Oxford, England) Pub Date : 2025-03-29 DOI: 10.1093/bioinformatics/btaf122
Zuqi Li, Sam F L Windels, Noël Malod-Dognin, Seth M Weinberg, Mary L Marazita, Susan Walsh, Mark D Shriver, David W Fardo, Peter Claes, Nataša Pržulj, Kristel Van Steen
{"title":"Clustering individuals using INMTD: a novel versatile multi-view embedding framework integrating omics and imaging data.","authors":"Zuqi Li, Sam F L Windels, Noël Malod-Dognin, Seth M Weinberg, Mary L Marazita, Susan Walsh, Mark D Shriver, David W Fardo, Peter Claes, Nataša Pržulj, Kristel Van Steen","doi":"10.1093/bioinformatics/btaf122","DOIUrl":"10.1093/bioinformatics/btaf122","url":null,"abstract":"<p><strong>Motivation: </strong>Combining omics and images can lead to a more comprehensive clustering of individuals than classic single-view approaches. Among the various approaches for multi-view clustering, nonnegative matrix tri-factorization (NMTF) and nonnegative Tucker decomposition (NTD) are advantageous in learning low-rank embeddings with promising interpretability. Besides, there is a need to handle unwanted drivers of clusterings (i.e. confounders).</p><p><strong>Results: </strong>In this work, we introduce a novel multi-view clustering method based on NMTF and NTD, named INMTD, which integrates omics and 3D imaging data to derive unconfounded subgroups of individuals. According to the adjusted Rand index, INMTD outperformed other clustering methods on a synthetic dataset with known clusters. In the application to real-life facial-genomic data, INMTD generated biologically relevant embeddings for individuals, genetics, and facial morphology. By removing confounded embedding vectors, we derived an unconfounded clustering with better internal and external quality; the genetic and facial annotations of each derived subgroup highlighted distinctive characteristics. In conclusion, INMTD can effectively integrate omics data and 3D images for unconfounded clustering with biologically meaningful interpretation.</p><p><strong>Availability and implementation: </strong>INMTD is freely available at https://github.com/ZuqiLi/INMTD.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143694775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AVPpred-BWR: antiviral peptides prediction via biological words representation.
Bioinformatics (Oxford, England) Pub Date : 2025-03-29 DOI: 10.1093/bioinformatics/btaf126
Zhuoyu Wei, Yongqi Shen, Xiang Tang, Jian Wen, Youyi Song, Mingqiang Wei, Jing Cheng, Xiaolei Zhu
{"title":"AVPpred-BWR: antiviral peptides prediction via biological words representation.","authors":"Zhuoyu Wei, Yongqi Shen, Xiang Tang, Jian Wen, Youyi Song, Mingqiang Wei, Jing Cheng, Xiaolei Zhu","doi":"10.1093/bioinformatics/btaf126","DOIUrl":"10.1093/bioinformatics/btaf126","url":null,"abstract":"<p><strong>Motivation: </strong>Antiviral peptides (AVPs) are short chains of amino acids, showing great potential as antiviral drugs. The traditional wisdom (e.g. wet experiments) for identifying the AVPs is time-consuming and laborious, while cutting-edge computational methods are less accurate to predict them.</p><p><strong>Results: </strong>In this article, we propose an AVPs prediction model via biological words representation, dubbed AVPpred-BWR. Based on the fact that the secondary structures of AVPs mainly consist of α-helix and loop, we explore the biological words of 1mer (corresponding to loops) and 4mer (4 continuous residues, corresponding to α-helix). That is, the peptides sequences are decomposed into biological words, and then the concealed sequential information is represented by training the Word2Vec models. Moreover, in order to extract multi-scale features, we leverage a CNN-Transformer framework to process the embeddings of 1mer and 4mer generated by Word2Vec models. To the best of our knowledge, this is the first time to realize the word segmentation of protein primary structure sequences based on the regularity of protein secondary structure. AVPpred-BWR illustrates clear improvements over its competitors on the independent test set (e.g. improvements of 4.6% and 11.0% for AUROC and MCC, respectively, compared to UniDL4BioPep).</p><p><strong>Availability and implementation: </strong>AVPpred-BWR is publicly available at: https://github.com/zyweizm/AVPpred-BWR or https://zenodo.org/records/14880447 (doi: 10.5281/zenodo.14880447).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11968319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143733674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信