Frontiers in bioinformatics最新文献_第3页

Protein cleaver: an interactive web interface for in silico prediction and systematic annotation of protein digestion-derived peptides. 蛋白质切割器：一个交互式网络界面，用于蛋白质消化衍生肽的计算机预测和系统注释。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1576317

Grigorios Koulouras, Yingrong Xu

{"title":"Protein cleaver: an interactive web interface for in silico prediction and systematic annotation of protein digestion-derived peptides.","authors":"Grigorios Koulouras, Yingrong Xu","doi":"10.3389/fbinf.2025.1576317","DOIUrl":"10.3389/fbinf.2025.1576317","url":null,"abstract":"Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1576317"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling. 自适应采样方法有助于确定可靠的数据集大小，用于循证建模。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1528515

Tim Breitenbach, Thomas Dandekar

{"title":"Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling.","authors":"Tim Breitenbach, Thomas Dandekar","doi":"10.3389/fbinf.2025.1528515","DOIUrl":"10.3389/fbinf.2025.1528515","url":null,"abstract":"How can we be sure that there is sufficient data for our model, such that the predictions remain reliable on unseen data and the conclusions drawn from the fitted model would not vary significantly when using a different sample of the same size? We answer these and related questions through a systematic approach that examines the data size and the corresponding gains in accuracy. Assuming the sample data are drawn from a data pool with no data drift, the law of large numbers ensures that a model converges to its ground truth accuracy. Our approach provides a heuristic method for investigating the speed of convergence with respect to the size of the data sample. This relationship is estimated using sampling methods, which introduces a variation in the convergence speed results across different runs. To stabilize results-so that conclusions do not depend on the run-and extract the most reliable information encoded in the available data regarding convergence speed, the presented method automatically determines a sufficient number of repetitions to reduce sampling deviations below a predefined threshold, thereby ensuring the reliability of conclusions about the required amount of data.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1528515"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel linear indexing method for strings under all internal nodes in a suffix tree. 一种新颖的字符串在后缀树所有内部节点下的线性索引方法。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-04 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1577324

Anas Al-Okaily, Abdelghani Tbakhi

引用次数: 0

Editorial: Networks and graphs in biological data: current methods, opportunities and challenges. 编辑：生物数据中的网络和图形：当前的方法、机遇和挑战。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1685992

Derek L Thompson, Hsiang-Yun Wu, Christopher W Bartlett, William C Ray

引用次数: 0

Germline mutation profiling of breast cancer patients using a non-BRCA sequencing panel. 使用非brca测序面板的乳腺癌患者种系突变谱分析。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-02 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1620025

Sonar Soni Panigoro, Rafika Indah Paramita, Fadilah Fadilah, Septelia Inawati Wanandi, Aisyah Fitriannisa Prawiningrum, Linda Erlina, Wahyu Dian Utari, Ajeng Megawati Fajrin

引用次数: 0

COC α DA - a fast and scalable algorithm for interatomic contact detection in proteins using C α distance matrices. COC α DA -一种基于C α距离矩阵的快速可扩展的蛋白质原子间接触检测算法。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-09-01 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1630078

Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi

{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">COC <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> DA - a fast and scalable algorithm for interatomic contact detection in proteins using C <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> distance matrices.","authors":"Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi","doi":"10.3389/fbinf.2025.1630078","DOIUrl":"10.3389/fbinf.2025.1630078","url":null,"abstract":"Protein interatomic contacts, defined by spatial proximity and physicochemical complementarity at atomic resolution, are fundamental to characterizing molecular interactions and bonding. Methods for calculating contacts are generally categorized as cutoff-dependent, which rely on Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations. While cutoff-dependent methods are recognized for their simplicity, completeness, and reliability, traditional implementations remain computationally expensive, posing significant scalability challenges in the current Big Data era of bioinformatics. Here, we introduce COC <math><mrow><mi>α</mi></mrow> </math> DA (COntact search pruning by C <math><mrow><mi>α</mi></mrow> </math> Distance Analysis), a Python-based command-line tool for improving search pruning in large-scale interatomic protein contact analysis using alpha-carbon (C <math><mrow><mi>α</mi></mrow> </math> ) distance matrices. COC <math><mrow><mi>α</mi></mrow> </math> DA detects intra- and inter-chain contacts, and classifies them into seven different types: hydrogen and disulfide bonds; hydrophobic effects; attractive, repulsive, and salt-bridge interactions; and aromatic stackings. To evaluate our tool, we compared it with three traditional approaches in the literature: all-against-all atom distance calculation (\"brute-force\"), static C <math><mrow><mi>α</mi></mrow> </math> distance cutoff (SC), and Biopython's NeighborSearch class (NS). COC <math><mrow><mi>α</mi></mrow> </math> DA demonstrated superior performance compared to the other methods, achieving on average 6x faster computation times than advanced data structures like k-d trees from NS, in addition to being simpler to implement and fully customizable. The presented tool facilitates exploratory and large-scale analyses of interatomic contacts in proteins in a simple and efficient manner, also enabling the integration of results with other tools and pipelines. The COC <math><mrow><mi>α</mi></mrow> </math> DA tool is freely available at https://github.com/LBS-UFMG/COCaDA.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1630078"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing bioinformatics capacity through Nextflow and nf-core: lessons from an early-to mid-career researchers-focused program at The Kids Research Institute Australia. 通过Nextflow和nf-core提升生物信息学能力：来自澳大利亚儿童研究所早期到中期职业研究人员的经验教训。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-08-29 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1610015

Patricia Agudelo-Romero, Talya Conradie, Jose Antonio Caparros-Martin, David Jimmy Martino, Anthony Kicic, Stephen Michael Stick, Christopher Hakkaart, Abhinav Sharma

引用次数: 0

Identifying novel therapeutic targets for non-alcoholic fatty liver disease using bioinformatics approaches: from drug repositioning to traditional Chinese medicine. 利用生物信息学方法确定非酒精性脂肪肝的新治疗靶点：从药物重新定位到中药。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-08-26 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1613985

Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu

{"title":"Identifying novel therapeutic targets for non-alcoholic fatty liver disease using bioinformatics approaches: from drug repositioning to traditional Chinese medicine.","authors":"Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu","doi":"10.3389/fbinf.2025.1613985","DOIUrl":"10.3389/fbinf.2025.1613985","url":null,"abstract":"Background: Non-alcoholic fatty liver disease (NAFLD) is a prevalent condition with limited effective treatments, necessitating novel therapeutic strategies. Bioinformatics offers a promising approach to identify new targets by analyzing gene expression and drug interactions.Objective: This study aims to identify novel therapeutic targets for NAFLD through bioinformatics, focusing on drug repositioning and traditional Chinese medicine (TCM) components.Methods: Three NAFLD-related gene expression datasets (GSE260666, GSE126848, GSE135251) were analyzed to identify differentially expressed genes. Protein-protein interaction networks were constructed using STRING and visualized with Cytoscape. Pathway enrichment analysis was performed, and drug-gene interactions were explored using the DGIdb database. TCM components were screened via the HERB database, with molecular docking conducted to assess binding affinities.Results: Key hub genes (CXCL2, CDKN1A, TNFRSF12A, HGFAC) were identified, with significant enrichment in cell proliferation and PI3K-Akt signaling pathways. Cyclosporine emerged as a potential repurposed drug, while TCM components (curcumin, resveratrol, berberine) showed strong binding affinities to NAFLD targets.Conclusion: Cyclosporine and TCM compounds are promising candidates for NAFLD treatment, warranting further experimental validation to confirm their therapeutic potential.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613985"},"PeriodicalIF":3.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler. 在基因组组装中使用强化学习：对q学习组装器的深入分析。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1633623

Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves

{"title":"Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler.","authors":"Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves","doi":"10.3389/fbinf.2025.1633623","DOIUrl":"10.3389/fbinf.2025.1633623","url":null,"abstract":"Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no \"best assembler\", and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to \"real-life\" problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1633623"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Novel deep learning for multi-class classification of Alzheimer's in disability using MRI datasets. 利用MRI数据集对残疾的阿尔茨海默病进行多类分类的新型深度学习。

IF 3.9

Frontiers in bioinformatics Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI: 10.3389/fbinf.2025.1567219

Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni

{"title":"Novel deep learning for multi-class classification of Alzheimer's in disability using MRI datasets.","authors":"Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni","doi":"10.3389/fbinf.2025.1567219","DOIUrl":"10.3389/fbinf.2025.1567219","url":null,"abstract":"Introduction: Alzheimer's disease (AD) is one of the most common neurodegenerative disabilities that often leads to memory loss, confusion, difficulty in language and trouble with motor coordination. Although several machine learning (ML) and deep learning (DL) algorithms have been utilized to identify Alzheimer's disease (AD) from MRI scans, precise classification of AD categories remains challenging as neighbouring categories share common features.Methods: This study proposes transfer learning-based methods for extracting features from MRI scans for multi-class classification of different AD categories. Four transfer learning-based feature extractors, namely, ResNet152V2, VGG16, InceptionV3, and MobileNet have been employed on two publicly available datasets (i.e., ADNI and OASIS) and a Merged dataset combining ADNI and OASIS, each having four categories: Moderate Demented (MoD), Mild Demented (MD), Very Mild Demented (VMD), and Non Demented (ND).Results: Results suggest the Modified ResNet152V2 as the optimal feature extractor among the four transfer learning methods. Next, by utilizing the modified ResNet152V2 as a feature extractor, a Convolutional Neural Network based model, namely, the 'IncepRes', is proposed by fusing the Inception and ResNet architectures for multiclass classification of AD categories. The results indicate that our proposed model achieved a standard accuracy of 96.96%, 98.35% and 97.13% for ADNI, OASIS, and Merged datasets, respectively, outperforming other competing DL structures.Discussion: We hope that our proposed framework may automate the precise classifications of various AD categories, and thereby can offer the prompt management and treatment of cognitive and functional impairments associated with AD.","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1567219"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145002021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0