{"title":"Exploiting subtractive genomics to identify novel drug targets and new immunogenic candidates against <i>Bordetella pertussis</i>: an <i>in silico</i> study.","authors":"Mahshid Khazani Asforooshani, Narjes Noori Goodarzi, Behzad Shahbazi, Nayereh Rezaie Rahimi, Kimia Mahdavian, Mahdi Rohani, Farzad Badmasti","doi":"10.3389/fbinf.2025.1570054","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1570054","url":null,"abstract":"<p><strong>Background: </strong><i>Bordetella pertussis</i>, the causative agent of whooping cough, remains a significant global health concern despite the widespread availability of vaccines. The persistent reemergence of pertussis is driven by the bacterium's ongoing genomic evolution, shifting epidemiological patterns, and limitations in current vaccine strategies. These challenges highlight the urgent need to identify novel drug targets and immunogenic candidates to enhance therapeutic and preventive measures against <i>B. pertussis</i>.</p><p><strong>Methods: </strong>Identification of novel drug targets and the detection of immunogenic factors as potential vaccine candidates were performed. Cytoplasmic proteins were evaluated for their similarity to the human proteome, metabolic pathways, and gut microbiota. On the other hand, surface-exposed proteins were evaluated as immunogenic targets using a reverse vaccinology approach. A multi-epitope vaccine (MEV) was designed based on the immunogenic linear B-cell epitopes of three autotransporters and the beta domain of SphB2 as a scaffold for MEV. Molecular docking, immune simulation results, and molecular dynamics simulations were performed to evaluate the binding affinity and feasibility of interaction between chimeric MEVs and immune receptors.</p><p><strong>Results: </strong>Six proteins were identified as excellent potential drug targets, including elongation factor P (WP_003810194.1), Aspartate kinase (WP_010930633.1), 50S ribosomal protein L21 (WP_003807462.1), Homoserine dehydrogenase (WP_003813074.1), Carboxynorspermidine decarboxylase (WP_003814461.1), and PTS sugar transporter subunit IIA (WP_010929966.1). On the other hand, reverse vaccinology identified nine immunogenic proteins, including BapA (WP_010930805.1), BrkA (WP_010931506.1), SphB2 (WP_041166323.1), TcfA (WP_010930243.1), FliK (WP_041166144.1), Fimbrial protein (WP_010930199.1), TolA (WP_010931418.1), DD-metalloendopeptidase (WP_003811022.1), and an I78 family peptidase inhibitor protein (WP_003812179.1). SphB2-based MEV was designed using six linear B-cell epitopes of the extracellular loops of the autotransporters. The binding affinity and feasibility of the interaction between MEV and TLR2, TLR4, and HLA-DR-B were computationally confirmed by molecular dynamics.</p><p><strong>Conclusion: </strong>It appears that proteins involved in translation and metabolism can be considered novel drug targets. Furthermore, this study highlights autotransporter proteins as promising immune targets. There is no doubt that experimental work should be conducted to confirm the results in the future.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1570054"},"PeriodicalIF":2.8,"publicationDate":"2025-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106433/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144164234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Structural biology meets typography: using protein structures to inspire creative expression and connect diverse audiences.","authors":"Leonora Martínez-Núñez","doi":"10.3389/fbinf.2025.1589122","DOIUrl":"10.3389/fbinf.2025.1589122","url":null,"abstract":"<p><p>Proteins are complex molecular machines with specific structures that determine their function. Advances in structural bioinformatics and visualization have expanded access to molecular data, most notably through the Protein Data Bank (PDB). This perspective explores the intersection between structural biology and typography, integrating a protein alphabet with the 36 Days of Type design project. Using ChimeraX, Blender and Molecular Nodes, 3D molecular models were processed, stylized, and shared on social media under the #36daysoftype hashtag, which led to engagement across a diverse audience. This work was also presented at VIZBI 2024 conference and influenced the VIZBI 2025 conference logo design. This project frames the role of scientific illustration and visual arts in connecting disciplines, boosting public engagement, and encouraging interdisciplinary collaboration, while also inspiring future applications like biology-inspired typography to enhance scientific literacy.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1589122"},"PeriodicalIF":2.8,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12094914/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel Griffard-Smith, Emily Schueddig, Diane E Mahoney, Prabhakar Chalise, Devin C Koestler, Dong Pei
{"title":"micRoclean: an R package for decontaminating low-biomass 16S-rRNA microbiome data.","authors":"Rachel Griffard-Smith, Emily Schueddig, Diane E Mahoney, Prabhakar Chalise, Devin C Koestler, Dong Pei","doi":"10.3389/fbinf.2025.1556361","DOIUrl":"10.3389/fbinf.2025.1556361","url":null,"abstract":"<p><p>In 16S-rRNA microbiome studies, cross-contamination and environmental contamination can obscure true biological signal. This contamination is particularly problematic in low-biomass studies, which are characterized by samples with a small amount of microbial DNA. Although multiple methods and packages for decontaminating microbiome data exist, there is no consensus on the most appropriate tool for decontamination based on the individual research study design and how to quantify the impact of removing identified contaminants to avoid over-filtering. To address these gaps, we introduce micRoclean, an open-source R package that contains two distinct microbiome decontamination pipelines with guidance on which to select based on the downstream goals of the research study and study design. This package integrates and expands on existing packages for microbiome decontamination and analysis for convenience of users. Furthermore, micRoclean also implements a filtering loss statistic to quantify the impact of decontamination on the overall covariance structure of the data. In this paper, we demonstrate the utility of micRoclean through implementation on example data, illustrating that micRoclean effectively and intuitively decontaminates microbiome data. Further, we demonstrate through a multi-batch simulated microbiome sample that micRoclean matches or outperforms tools with similar objectives. This package is freely available from GitHub repository rachelgriffard/micRoclean.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1556361"},"PeriodicalIF":2.8,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12095030/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144129638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankita Lawarde, Masuma Khatun, Prakash Lingasamy, Andres Salumets, Vijayachitra Modhukur
{"title":"Tumor tissue-of-origin classification using miRNA-mRNA-lncRNA interaction networks and machine learning methods.","authors":"Ankita Lawarde, Masuma Khatun, Prakash Lingasamy, Andres Salumets, Vijayachitra Modhukur","doi":"10.3389/fbinf.2025.1571476","DOIUrl":"10.3389/fbinf.2025.1571476","url":null,"abstract":"<p><strong>Introduction: </strong>MicroRNAs (miRNAs) regulate gene expression and play an important role in carcinogenesis through complex interactions with messenger RNAs (mRNAs) and long non-coding RNAs (lncRNAs). Despite their established influence on tumor progression and therapeutic resistance, the application of miRNA interaction networks for tumor tissue-of-origin (TOO) classification remains underexplored.</p><p><strong>Methods: </strong>We developed a machine learning (ML) framework that integrates miRNA-mRNA-lncRNA interaction networks to classify tumors by their tissue of origin. Using transcriptomic profiles from 14 cancer types in The Cancer Genome Atlas (TCGA), we constructed co-expression networks and applied multiple feature selection techniques including recursive feature elimination (RFE), random forest (RF), Boruta, and linear discriminant analysis (LDA) to identify a minimal yet informative subset of miRNA features. Ensemble ML algorithms were trained and validated with stratified five-fold cross-validation for robust performance assessment across class distributions.</p><p><strong>Results: </strong>Our models achieved an overall 99% classification accuracy, distinguishing 14 cancer types with high robustness and generalizability. A minimal set of 150 miRNAs selected via RFE resulted in optimal performance across all classifiers. Furthermore, in silico validation revealed that many of the top miRNAs, including <i>miR-21-5p, miR-93-5p,</i> and <i>miR-10b-5p</i>, were not only highly central in the network but also correlated with patient survival and drug response. In addition, functional enrichment analyses indicated significant involvement of miRNAs in pathways such as <i>TGF</i>-beta signaling, epithelial-mesenchymal transition, and immune modulation. Our comparative analysis demonstrated that models based on miRNA outperformed those using mRNA or lncRNA classifiers.</p><p><strong>Discussion: </strong>Our integrated framework provides a biologically grounded, interpretable, and highly accurate approach for tumor tissue-of-origin classification. The identified miRNA biomarkers demonstrate strong translational potential, supported by clinical trial overlap, drug sensitivity data, and survival analyses. This work highlights the power of combining miRNA network biology with ML to improve precision oncology diagnostics and supports future development of liquid biopsy-based cancer classification.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1571476"},"PeriodicalIF":2.8,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12088952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marina Leer, George A Soultoukis, Markus Jähnert, Masoome Oveisi, Dirk Walther, Tim J Schulz
{"title":"pubCounteR: an R package for interrogating published literature for experimentally-derived gene lists within a user-defined biological context.","authors":"Marina Leer, George A Soultoukis, Markus Jähnert, Masoome Oveisi, Dirk Walther, Tim J Schulz","doi":"10.3389/fbinf.2025.1523184","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1523184","url":null,"abstract":"<p><p>Basic and clinical biomedical research relies heavily on modern large-scale datasets that include genomics, transcriptomics, epigenomics, metabolomics, and proteomics, among other \"Omics\". These research tools very often generate lists of candidate genes that are hypothesized or shown to be responsible for the biological effect in question. To aid the biological interpretation of experimentally-obtained gene lists, we developed pubCounteR, an R-package and web-based interface that screens publications by a user-defined set of keywords representing a specific biological context for experimentally-derived gene lists.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1523184"},"PeriodicalIF":2.8,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12118352/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144175990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Behnam Hasannejad-Asl, Farkhondeh Pooresmaeil, Sareh Azadi, Ali Najafi, Ali Esmaeili, Saeid Bagheri-Mohammadi, Bahram Kazemi
{"title":"Computational drug discovery of potential 5α-reductase phytochemical inhibitors and hair growth promotion using <i>in silico</i> techniques.","authors":"Behnam Hasannejad-Asl, Farkhondeh Pooresmaeil, Sareh Azadi, Ali Najafi, Ali Esmaeili, Saeid Bagheri-Mohammadi, Bahram Kazemi","doi":"10.3389/fbinf.2025.1570101","DOIUrl":"10.3389/fbinf.2025.1570101","url":null,"abstract":"<p><strong>Introduction: </strong>Male pattern hair loss (MPHL), also known as androgenetic alopecia (AGA), is a common disorder primarily caused by dihydrotestosterone (DHT). The Food and Drug Administration (FDA) has approved two 5-alpha reductase (5-AR) inhibitors-finasteride and dutasteride-for treating this condition. However, recent studies have reported adverse sexual side effects and issues with sperm production in young men using these medications. There are also recommendations for effectively treating hair loss with natural remedies, such as <i>Urtica dioica (nettle)</i>, <i>Serenoa repens (saw palmetto)</i>, and <i>Trigonella foenum-graecum (fenugreek)</i> that is mainly used for diminish the hair loss in the traditional medicine. Research shows that these herbal formulations and plant extracts may help reduce hair loss. However, the concentration of active compounds in these herbal extracts is often low, necessitating a large extract volume to achieve noticeable effects on hair growth. Although many studies have investigated the effects of these herbal extracts on hair growth, fewer studies focus on the specific compounds influencing the molecular mechanisms of hair loss, particularly the inhibition of 5-AR.</p><p><strong>Methods: </strong>For the first time, we aimed to applied a computational study to explore the phytochemicals extracted from these herbs to identify compounds that can effectively bind to and inhibit 5-AR. Additionally, we assessed the stability of the ligands encapsulated in lipid nanoparticles (LNP) by conducting molecular dynamics (MD) simulations of the LNP-encapsulated ligands. We utilized an online database to identify compounds from the extracts of nettle, saw palmetto, and fenugreek. We then analyzed their binding affinity to 5-AR using computational techniques.</p><p><strong>Results: </strong>We found that 6 molecules-Jamogenin, Neodiosgenin, Chlorogenic acid, Rutin, Riboflavin, and Ursolic acid-are effective in binding to 5-AR. Additionally, our <i>in silico</i> studies revealed that vesicle-entrapped JAMOGENIN, which has a stronger bond with 5-AR, is more stable than its unencapsulated form.</p><p><strong>Discussion: </strong>Therefore, these 6 molecules, particularly JAMOGENIN, should be considered for experimental analysis in both their unencapsulated and nanocarrier-encapsulated states to promote hair follicle growth.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1570101"},"PeriodicalIF":2.8,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12089051/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144112973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating phylogenies with chronology to assemble the tree of life.","authors":"Jose Barba-Montoya, Jack M Craig, Sudhir Kumar","doi":"10.3389/fbinf.2025.1571568","DOIUrl":"10.3389/fbinf.2025.1571568","url":null,"abstract":"<p><p>Reconstructing the global Tree of Life necessitates computational approaches to integrate numerous molecular phylogenies with limited species overlap into a comprehensive supertree. Our survey of published literature shows that individual phylogenies are frequently restricted to specific taxonomic groups due to investigators' expertise and molecular evolutionary considerations, resulting in any given species present in a minuscule fraction of phylogenies. We present a novel approach, called the chronological supertree algorithm (Chrono-STA), that can build a supertree of species from such data by using node ages in published molecular phylogenies scaled to time. Chrono-STA builds a supertree by integrating chronological data from molecular timetrees. It fundamentally differs from existing approaches that generate consensus phylogenies from gene trees with missing taxa, as Chrono-STA does not impute nodal distances, use a guide tree as a backbone, or reduce phylogenies to quartets. Analyses of simulated and empirical datasets show that Chrono-STA can combine taxonomically restricted timetrees with extremely limited species overlap. For such data, approaches that impute missing distances or assemble phylogenetic quartets did not perform well. We conclude that integrating phylogenies via temporal dimension enhances the accuracy of reconstructed supertrees that are also scaled to time.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1571568"},"PeriodicalIF":2.8,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12075222/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144082568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Artificial intelligence in variant calling: a review.","authors":"Omar Abdelwahab, Davoud Torkamaneh","doi":"10.3389/fbinf.2025.1574359","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1574359","url":null,"abstract":"<p><p>Artificial intelligence (AI) has revolutionized numerous fields, including genomics, where it has significantly impacted variant calling, a crucial process in genomic analysis. Variant calling involves the detection of genetic variants such as single nucleotide polymorphisms (SNPs), insertions/deletions (InDels), and structural variants from high-throughput sequencing data. Traditionally, statistical approaches have dominated this task, but the advent of AI led to the development of sophisticated tools that promise higher accuracy, efficiency, and scalability. This review explores the state-of-the-art AI-based variant calling tools, including DeepVariant, DNAscope, DeepTrio, Clair, Clairvoyante, Medaka, and HELLO. We discuss their underlying methodologies, strengths, limitations, and performance metrics across different sequencing technologies, alongside their computational requirements, focusing primarily on SNP and InDel detection. By comparing these AI-driven techniques with conventional methods, we highlight the transformative advancements AI has introduced and its potential to further enhance genomic research.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1574359"},"PeriodicalIF":2.8,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12055765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143999521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Biomarker-driven drug repurposing for NAFLD-associated hepatocellular carcinoma using machine learning integrated ensemble feature selection.","authors":"Subhajit Ghosh, Sukhen Das Mandal, Subarna Thakur","doi":"10.3389/fbinf.2025.1522401","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1522401","url":null,"abstract":"<p><p>The incidence of non-alcoholic fatty liver disease (NAFLD), encompassing the more severe non-alcoholic steatohepatitis (NASH), is rising alongside the surges in diabetes and obesity. Increasing evidence indicates that NASH is responsible for a significant share of idiopathic hepatocellular carcinoma (HCC) cases, a fatal cancer with a 5-year survival rate below 22%. Biomarkers can facilitate early screening and monitoring of at-risk NAFLD/NASH patients and assist in identifying potential drug candidates for treatment. This study utilized an ensemble feature selection framework to analyze transcriptomic data, identifying biomarker genes associated with the stage-wise progression of NAFLD-related HCC. Seven machine learning algorithms were assessed for disease stage classification. Twelve feature selection methods including correlation-based techniques, mutual information-based methods, and embedded techniques were utilized to rank the top genes as features, through this approach, multiple feature selection methods were combined to yield more robust features important in this disease progression. Cox regression-based survival analysis was carried out to evaluate the biomarker potentiality of these genes. Furthermore, multiphase drug repurposing strategy and molecular docking were employed to identify potential drug candidates against these biomarkers. Among the seven machine learning models initially evaluated, DISCR resulted as the most accurate disease stage classifier. Ensemble feature selection identified ten top genes, among which eight were recognized as potential biomarkers based on survival analysis. These include genes ABAT, ABCB11, MBTPS1, and ZFP1 mostly involved in alanine and glutamate metabolism, butanoate metabolism, and ER protein processing. Through drug repurposing, 81 candidate drugs were found to be effective against these markers genes, with Diosmin, Esculin, Lapatinib, and Phenelzine as the best candidates screened through molecular docking and MMGBSA. The consensus derived from multiple methods enhances the accuracy of identifying relevant robust biomarkers for NAFLD-associated HCC. The use of these biomarkers in a multiphase drug repurposing strategy highlights potential therapeutic options for early intervention, which is essential to stop disease progression and improve outcomes.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1522401"},"PeriodicalIF":2.8,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144051926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Natasha N Kunchur, Joshua J A Poole, Jesse Levine, Tillie-Louise Hackett, Rebecca Thornhill, Leila B Mostaço-Guidolin
{"title":"Classification of collagen remodeling in asthma using second-harmonic generation imaging, supervised machine learning and texture-based analysis.","authors":"Natasha N Kunchur, Joshua J A Poole, Jesse Levine, Tillie-Louise Hackett, Rebecca Thornhill, Leila B Mostaço-Guidolin","doi":"10.3389/fbinf.2025.1539936","DOIUrl":"https://doi.org/10.3389/fbinf.2025.1539936","url":null,"abstract":"<p><p>Airway remodeling is present in all stages of asthma severity and has been linked to reduced lung function, airway hyperresponsiveness and increased deposition of fibrillar collagens. Traditional histological staining methods used to visualize the fibrotic response are poorly suited to capture the morphological traits of extracellular matrix (ECM) proteins in their native state, hindering our understanding of disease pathology. Conversely, second harmonic generation (SHG), provides label-free, high-resolution visualization of fibrillar collagen; a primary ECM protein contributing to the loss of asthmatic lung elasticity. From a cohort of 13 human lung donors, SHG-imaged collagen belonging to non-asthmatic (control) and asthmatic donors was evaluated through a custom textural classification pipeline. Integrated with supervised machine learning, the pipeline enables the precise quantification and characterization of collagen, delineating amongst control and remodeled airways. Collagen distribution is quantified and characterized using 80 textural features belonging to the Gray Level Cooccurrence Matrix (GLCM), Gray Level Size Zone Matrix (GLSZM), Gray Level Run Length Matrix (GLRLM), Gray Level Dependence Matrix (GLDM) and Neighboring Gray Tone Difference Matrix (NGTDM). To denote an accurate subset of features reflective of fibrillar collagen formation; filter, wrapper, embedded and novel statistical methods were applied as feature refinement. Textural feature subsets of high predictor importance trained a support vector machine model, achieving an AUC-ROC of 94% ± 0.0001 in the classification of remodeled airway collagen vs. control lung tissue. Combined with detailed texture analysis and supervised ML, we demonstrate that morphological variation amongst remodeled SHG-imaged collagen in lung tissue can be successfully characterized.</p>","PeriodicalId":73066,"journal":{"name":"Frontiers in bioinformatics","volume":"5 ","pages":"1539936"},"PeriodicalIF":2.8,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12043662/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144029957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}