{"title":"Protein cleaver: an interactive web interface for <i>in silico</i> prediction and systematic annotation of protein digestion-derived peptides.","authors":"Grigorios Koulouras, Yingrong Xu","doi":"10.3389/fbinf.2025.1576317","DOIUrl":"10.3389/fbinf.2025.1576317","url":null,"abstract":"<p><p>Proteolytic digestion is an essential process in mass spectrometry-based proteomics for converting proteins into peptides, hence crucial for protein identification and quantification. In a typical proteomics experiment, digestion reagents are selected without prior evaluation of their optimality for detecting proteins or peptides of interest, partly due to the lack of comprehensive and user-friendly predictive tools. In this work, we introduce Protein Cleaver, a web-based application that systematically assesses regions of proteins that are likely or unlikely to be identified, along with extensive sequence and structure annotation and visualization features. We showcase practical examples of Protein Cleaver's usability in drug discovery and highlight proteins that are typically difficult to detect using the most common proteolytic enzymes. We evaluate trypsin and chymotrypsin for identifying G-protein-coupled receptors and discover that chymotrypsin produces significantly more identifiable peptides than trypsin. We perform a bulk digestion analysis and assess 36 proteolytic enzymes for their ability to detect most of cysteine-containing peptides in the human proteome. We anticipate Protein Cleaver to be a valuable auxiliary tool for proteomics scientists.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1576317"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12445168/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive sampling methods facilitate the determination of reliable dataset sizes for evidence-based modeling.","authors":"Tim Breitenbach, Thomas Dandekar","doi":"10.3389/fbinf.2025.1528515","DOIUrl":"10.3389/fbinf.2025.1528515","url":null,"abstract":"<p><p>How can we be sure that there is sufficient data for our model, such that the predictions remain reliable on unseen data and the conclusions drawn from the fitted model would not vary significantly when using a different sample of the same size? We answer these and related questions through a systematic approach that examines the data size and the corresponding gains in accuracy. Assuming the sample data are drawn from a data pool with no data drift, the law of large numbers ensures that a model converges to its ground truth accuracy. Our approach provides a heuristic method for investigating the speed of convergence with respect to the size of the data sample. This relationship is estimated using sampling methods, which introduces a variation in the convergence speed results across different runs. To stabilize results-so that conclusions do not depend on the run-and extract the most reliable information encoded in the available data regarding convergence speed, the presented method automatically determines a sufficient number of repetitions to reduce sampling deviations below a predefined threshold, thereby ensuring the reliability of conclusions about the required amount of data.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1528515"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12444090/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel linear indexing method for strings under all internal nodes in a suffix tree.","authors":"Anas Al-Okaily, Abdelghani Tbakhi","doi":"10.3389/fbinf.2025.1577324","DOIUrl":"10.3389/fbinf.2025.1577324","url":null,"abstract":"<p><p>Suffix trees are fundamental data structures in stringology and have wide applications across various domains. In this work, we propose two linear-time algorithms for indexing strings under each internal node in a suffix tree while preserving the ability to track similarities and redundancies across different internal nodes. This is achieved through a novel tree structure derived from the suffix tree, along with new indexing concepts. The resulting indexes offer practical solutions in several areas, including DNA sequence analysis and approximate pattern matching.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1577324"},"PeriodicalIF":3.9,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12443692/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145115160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Derek L Thompson, Hsiang-Yun Wu, Christopher W Bartlett, William C Ray
{"title":"Editorial: Networks and graphs in biological data: current methods, opportunities and challenges.","authors":"Derek L Thompson, Hsiang-Yun Wu, Christopher W Bartlett, William C Ray","doi":"10.3389/fbinf.2025.1685992","DOIUrl":"10.3389/fbinf.2025.1685992","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1685992"},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12437696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Germline mutation profiling of breast cancer patients using a non-BRCA sequencing panel.","authors":"Sonar Soni Panigoro, Rafika Indah Paramita, Fadilah Fadilah, Septelia Inawati Wanandi, Aisyah Fitriannisa Prawiningrum, Linda Erlina, Wahyu Dian Utari, Ajeng Megawati Fajrin","doi":"10.3389/fbinf.2025.1620025","DOIUrl":"10.3389/fbinf.2025.1620025","url":null,"abstract":"","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1620025"},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12436446/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145082588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">COC <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> DA - a fast and scalable algorithm for interatomic contact detection in proteins using C <ns0:math><ns0:mrow><ns0:mi>α</ns0:mi></ns0:mrow> </ns0:math> distance matrices.","authors":"Rafael Pereira Lemos, Diego Mariano, Sabrina De Azevedo Silveira, Raquel C de Melo-Minardi","doi":"10.3389/fbinf.2025.1630078","DOIUrl":"10.3389/fbinf.2025.1630078","url":null,"abstract":"<p><p>Protein interatomic contacts, defined by spatial proximity and physicochemical complementarity at atomic resolution, are fundamental to characterizing molecular interactions and bonding. Methods for calculating contacts are generally categorized as cutoff-dependent, which rely on Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations. While cutoff-dependent methods are recognized for their simplicity, completeness, and reliability, traditional implementations remain computationally expensive, posing significant scalability challenges in the current Big Data era of bioinformatics. Here, we introduce COC <math><mrow><mi>α</mi></mrow> </math> DA (COntact search pruning by C <math><mrow><mi>α</mi></mrow> </math> Distance Analysis), a Python-based command-line tool for improving search pruning in large-scale interatomic protein contact analysis using alpha-carbon (C <math><mrow><mi>α</mi></mrow> </math> ) distance matrices. COC <math><mrow><mi>α</mi></mrow> </math> DA detects intra- and inter-chain contacts, and classifies them into seven different types: hydrogen and disulfide bonds; hydrophobic effects; attractive, repulsive, and salt-bridge interactions; and aromatic stackings. To evaluate our tool, we compared it with three traditional approaches in the literature: all-against-all atom distance calculation (\"brute-force\"), static C <math><mrow><mi>α</mi></mrow> </math> distance cutoff (SC), and Biopython's NeighborSearch class (NS). COC <math><mrow><mi>α</mi></mrow> </math> DA demonstrated superior performance compared to the other methods, achieving on average 6x faster computation times than advanced data structures like <i>k</i>-d trees from NS, in addition to being simpler to implement and fully customizable. The presented tool facilitates exploratory and large-scale analyses of interatomic contacts in proteins in a simple and efficient manner, also enabling the integration of results with other tools and pipelines. The COC <math><mrow><mi>α</mi></mrow> </math> DA tool is freely available at https://github.com/LBS-UFMG/COCaDA.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1630078"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12433948/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145076621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Patricia Agudelo-Romero, Talya Conradie, Jose Antonio Caparros-Martin, David Jimmy Martino, Anthony Kicic, Stephen Michael Stick, Christopher Hakkaart, Abhinav Sharma
{"title":"Advancing bioinformatics capacity through Nextflow and nf-core: lessons from an early-to mid-career researchers-focused program at The Kids Research Institute Australia.","authors":"Patricia Agudelo-Romero, Talya Conradie, Jose Antonio Caparros-Martin, David Jimmy Martino, Anthony Kicic, Stephen Michael Stick, Christopher Hakkaart, Abhinav Sharma","doi":"10.3389/fbinf.2025.1610015","DOIUrl":"10.3389/fbinf.2025.1610015","url":null,"abstract":"<p><p>The increasing adoption of high-throughput \"omics\" technologies has heightened the demand for standardized, scalable, and reproducible bioinformatics workflows. Nextflow and nf-core provide a robust framework for researchers, particularly early- and mid-career researchers (EMCRs), to navigate complex data analysis. At The Kids Research Institute Australia, we implemented a structured approach to bioinformatics capacity building using these tools. This perspective presents nine practical rules derived from lessons learnt, which facilitated the successful adoption of Nextflow and nf-core, addressing implementation challenges, knowledge gaps, resource allocation, and community support. Our experience serves as a guide for institutions aiming to establish sustainable bioinformatics capabilities and empower EMCRs.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1610015"},"PeriodicalIF":3.9,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12425987/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145066651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu
{"title":"Identifying novel therapeutic targets for non-alcoholic fatty liver disease using bioinformatics approaches: from drug repositioning to traditional Chinese medicine.","authors":"Jingmin Zhang, Tianwei Meng, Weiqi Gao, Xinghua Li, Juan Xu","doi":"10.3389/fbinf.2025.1613985","DOIUrl":"10.3389/fbinf.2025.1613985","url":null,"abstract":"<p><strong>Background: </strong>Non-alcoholic fatty liver disease (NAFLD) is a prevalent condition with limited effective treatments, necessitating novel therapeutic strategies. Bioinformatics offers a promising approach to identify new targets by analyzing gene expression and drug interactions.</p><p><strong>Objective: </strong>This study aims to identify novel therapeutic targets for NAFLD through bioinformatics, focusing on drug repositioning and traditional Chinese medicine (TCM) components.</p><p><strong>Methods: </strong>Three NAFLD-related gene expression datasets (GSE260666, GSE126848, GSE135251) were analyzed to identify differentially expressed genes. Protein-protein interaction networks were constructed using STRING and visualized with Cytoscape. Pathway enrichment analysis was performed, and drug-gene interactions were explored using the DGIdb database. TCM components were screened via the HERB database, with molecular docking conducted to assess binding affinities.</p><p><strong>Results: </strong>Key hub genes (CXCL2, CDKN1A, TNFRSF12A, HGFAC) were identified, with significant enrichment in cell proliferation and PI3K-Akt signaling pathways. Cyclosporine emerged as a potential repurposed drug, while TCM components (curcumin, resveratrol, berberine) showed strong binding affinities to NAFLD targets.</p><p><strong>Conclusion: </strong>Cyclosporine and TCM compounds are promising candidates for NAFLD treatment, warranting further experimental validation to confirm their therapeutic potential.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1613985"},"PeriodicalIF":3.9,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12417881/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145042432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves
{"title":"Using reinforcement learning in genome assembly: in-depth analysis of a Q-learning assembler.","authors":"Kleber Padovani, Rafael Cabral Borges, Roberto Xavier, André Carlos Carvalho, Anna Reali, Annie Chateau, Ronnie Alves","doi":"10.3389/fbinf.2025.1633623","DOIUrl":"10.3389/fbinf.2025.1633623","url":null,"abstract":"<p><p>Genome assembly remains an unsolved problem, and de novo strategies (i.e., those run without a reference) are relevant but computationally complex tasks in genomics. Although de novo assemblers have been previously successfully applied in genomic projects, there is still no \"best assembler\", and the choice and setup of assemblers still rely on bioinformatics experts. Thus, as with other computationally complex problems, machine learning has emerged as an alternative (or complementary) way to develop accurate, fast and autonomous assemblers. Reinforcement learning has proven promising for solving complex activities without supervision, such as games, and there is a pressing need to understand the limits of this approach to \"real-life\" problems, such as the DNA fragment assembly problem. In this study, we analyze the boundaries of applying machine learning via reinforcement learning (RL) for genome assembly. We expand upon the previous approach found in the literature to solve this problem by carefully exploring the learning aspects of the proposed intelligent agent, which uses the Q-learning algorithm. We improved the reward system and optimized the exploration of the state space based on pruning and in collaboration with evolutionary computing (>300% improvement). We tested the new approaches on 23 environments. Our results suggest the unsatisfactory performance of the approaches, both in terms of assembly quality and execution time, providing strong evidence for the poor scalability of the studied reinforcement learning approaches to the genome assembly problem. Finally, we discuss the existing proposal, complemented by attempts at improvement that also proved insufficient. In doing so, we contribute to the scientific community by offering a clear mapping of the limitations and challenges that should be taken into account in future attempts to apply reinforcement learning to genome assembly.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1633623"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145001993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni
{"title":"Novel deep learning for multi-class classification of Alzheimer's in disability using MRI datasets.","authors":"Sumaiya Binte Shahid, Maleeha Kaikaus, Md Hasanul Kabir, Mohammad Abu Yousuf, A K M Azad, A S Al-Moisheer, Naif Alotaibi, Salem A Alyami, Touhid Bhuiyan, Mohammad Ali Moni","doi":"10.3389/fbinf.2025.1567219","DOIUrl":"10.3389/fbinf.2025.1567219","url":null,"abstract":"<p><strong>Introduction: </strong>Alzheimer's disease (AD) is one of the most common neurodegenerative disabilities that often leads to memory loss, confusion, difficulty in language and trouble with motor coordination. Although several machine learning (ML) and deep learning (DL) algorithms have been utilized to identify Alzheimer's disease (AD) from MRI scans, precise classification of AD categories remains challenging as neighbouring categories share common features.</p><p><strong>Methods: </strong>This study proposes transfer learning-based methods for extracting features from MRI scans for multi-class classification of different AD categories. Four transfer learning-based feature extractors, namely, ResNet152V2, VGG16, InceptionV3, and MobileNet have been employed on two publicly available datasets (i.e., ADNI and OASIS) and a Merged dataset combining ADNI and OASIS, each having four categories: Moderate Demented (MoD), Mild Demented (MD), Very Mild Demented (VMD), and Non Demented (ND).</p><p><strong>Results: </strong>Results suggest the Modified ResNet152V2 as the optimal feature extractor among the four transfer learning methods. Next, by utilizing the modified ResNet152V2 as a feature extractor, a Convolutional Neural Network based model, namely, the 'IncepRes', is proposed by fusing the Inception and ResNet architectures for multiclass classification of AD categories. The results indicate that our proposed model achieved a standard accuracy of 96.96%, 98.35% and 97.13% for ADNI, OASIS, and Merged datasets, respectively, outperforming other competing DL structures.</p><p><strong>Discussion: </strong>We hope that our proposed framework may automate the precise classifications of various AD categories, and thereby can offer the prompt management and treatment of cognitive and functional impairments associated with AD.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1567219"},"PeriodicalIF":3.9,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405159/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145002021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}