{"title":"Literature-driven extraction and computational prediction of causal statements linking genetic variants to biological processes, pathways and phenotypes.","authors":"Jici Jiang, Predrag Radivojac, Benjamin M Gyori","doi":"10.1142/9789819824755_0053","DOIUrl":"10.1142/9789819824755_0053","url":null,"abstract":"<p><p>Understanding the mechanistic basis of pathogenic genetic variants requires reconstructing the molecular pathways connecting the variant, via a chain of molecular intermediates, to a disease-causing biological process and phenotype. However, a literature-wide assembly of causal networks connecting variants, molecular pathways, biological processes and phenotypes has not been previously available. To create such a resource, we developed an automated pathway reconstruction approach building on the Integrated Network and Dynamical Reasoning Assembler (INDRA) system which extracts causal mechanistic statements (positive regulation, phosphorylation, complex formation, etc.) by combining structured databases and literature mining. We traversed INDRA statements extracted from publications to identify those describing a genetic variant resulting in a protein point mutation. We then reconstructed directed paths (consisting of one or more linked INDRA statements) connecting this variant to a term representing a biological process, phenotype or disease within the same publication. This resulted in a directed multigraph obtained from 25,862 paths for variants in 2,561 proteins. Each node in this graph corresponds to an ontology-grounded molecular or process term and each edge is explicitly linked to supporting literature evidence, enabling full auditability of inferred mechanisms. To leverage the assembled networks, we trained a classification model to predict likely downstream biological processes or specific disease associations for protein variants. As features to the model, we integrated molecular annotations (including protein sequence features, ClinVar pathogenicity labels, and UniProt domain mappings) in combination with representations from the ESM2 transformer-based protein language model. The performance achieved by this model shows promise for reconstructing causal mechanistic statements associated with function of genetic variants, a framing of the variant effect prediction task that goes significantly beyond simple assessment of pathogenicity. This integrative framework enables the mechanistic interpretation of known variants and prediction of functional relevance for variants lacking prior phenotypic annotation.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"738-751"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12952665/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehmet Eren Ahsen, Rand Kittani, Travis Gerke, Laya Krishnan, Sean Rogan, Erick R Scott
{"title":"Leveraging Generative AI for Interpretable Clinical Decision Making Through Causal Graphs.","authors":"Mehmet Eren Ahsen, Rand Kittani, Travis Gerke, Laya Krishnan, Sean Rogan, Erick R Scott","doi":"10.1142/9789819824755_0022","DOIUrl":"10.1142/9789819824755_0022","url":null,"abstract":"<p><p>Clinical AI systems' lack of interpretability limits their adoption in evidence-based medicine. To address this challenge, we propose a computational framework that harnesses generative AI's medical knowledge to create interpretable structural causal models (SCMs) for clinical decision support, quality improvement evaluation, and population health management. We evaluated our approach through a case study using data from the Midwest Healthcare Conference Causal Diagram Challenge, where we compared transformer-based large language models against human performance on a complex causal reasoning task: estimating COVID- 19 treatment effects through target trial emulation. Both groups designed SCMs to evaluate glucocorticoid treatment effects on 28-day mortality using real-world data from more than 2,000 hospitalized patients, benchmarked against published RECOVERY randomized controlled trial results. The best performing SCMs achieved bootstrap coverage rates exceeding 90% for two of three severity strata. Both human and AI models demonstrated equivalent clinical plausibility (n=3 expert reviewers) and similar statistical performance, though both struggled with critical disease severity. Ablation experiments comparing SCM-based approaches against traditional potential outcomes methods revealed SCMs achieved 76-98% coverage versus 1-37% for traditional methods. These results suggest that structural causal models can effectively bridge the interpretability gap in clinical AI by providing essential scaffolding for reliable causal inference and enabling meaningful human-AI collaboration while preserving methodological rigor essential for evidence-based medicine.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"309-323"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Patch-level phenotype identification via weakly supervised neuron selection in sparse autoencoders for CLIP-derived pathology embeddings.","authors":"Keita Tamura, Yao-Zhong Zhang, Yohei Okubo, Seiya Imoto","doi":"10.1142/9789819824755_0051","DOIUrl":"10.1142/9789819824755_0051","url":null,"abstract":"<p><p>Computer-aided analysis of whole slide images (WSIs) has advanced rapidly with the emergence of multi-modal pathology foundation models. In this study, we propose a weakly supervised neuron selection approach to extract disentangled representations from CLIPderived pathology foundation models, leveraging the interpretability of sparse autoencoders. Specifically, neurons are ordered and selected using whole-slide level labels within a multiple instance learning (MIL) framework. We investigate the impact of different pre-trained image embeddings derived from general and pathology images and demonstrate that a selected single neuron can effectively enable patch-level phenotype identification. Experiments on the Camelyon16 and PANDA datasets demonstrate both the effectiveness and explainability of the proposed method, as well as its generalization ability for tumor patch identification.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"708-721"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applications of AI & ML in Biomanufacturing of Cell and Gene Therapies.","authors":"Eric Neumann, Karen Weisinger, Tom Londo","doi":"10.1142/9789819824755_0063","DOIUrl":"10.1142/9789819824755_0063","url":null,"abstract":"<p><p>This workshop highlights how AI/ML technologies are beginning to be applied to biomanufacturing and bioengineering of cell and gene therapies (CGT). AI/ML have demonstrated their utility in biocomputing and biomedical research applications, and are poised to become central to design, scaling, and optimization of bioengineering processes such as CAR-T cells, iPSC, and biomolecule production. Invited speakers from academia and industry will speak of their experience in leveraging these new intelligent technologies.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"855-858"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gurkan Bebek, Onur Mutlu, Iman Hajirasouliha, Joshua Welch, Serguei Pakhomov
{"title":"Systems Biology and Network Analysis: From Multi-omics Integration to Biological Mechanisms.","authors":"Gurkan Bebek, Onur Mutlu, Iman Hajirasouliha, Joshua Welch, Serguei Pakhomov","doi":"10.1142/9789819824755_0054","DOIUrl":"10.1142/9789819824755_0054","url":null,"abstract":"<p><p>The following sections are included: Session Introduction: Systems Biology and Network Analysis; Session Summary; Acknowledgments.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"752-754"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sy Hwang, Sunil Thomas, Heather Williams, Tom Hutchinson, Emily Schriver, Ashley Batugo, Amit Bar-Or, Vishakha Sharma, Frederik Buijs, Christopher Perrone, Danielle Mowery
{"title":"Leveraging Large Language Models to Derive Multiple Sclerosis Progression Assessments from Clinical Notes: A Feasibility Study.","authors":"Sy Hwang, Sunil Thomas, Heather Williams, Tom Hutchinson, Emily Schriver, Ashley Batugo, Amit Bar-Or, Vishakha Sharma, Frederik Buijs, Christopher Perrone, Danielle Mowery","doi":"10.1142/9789819824755_0023","DOIUrl":"10.1142/9789819824755_0023","url":null,"abstract":"<p><p>Ascertainment of multiple sclerosis (MS) progression is important for informing clinical care decisions and supporting biomedical research. However, the details to infer a patient's MS progression status are locked within clinical notes. In this feasibility study, we assessed the feasibility of developing and validating a large language model (LLM)-based EDSS and FS classifier for ascertaining MS progression from clinical notes.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"324-337"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lio Schmitz, Markus Plack, Berkan Koyak, Muhammad Ehsan Ullah, Ahmad Aziz, Reinhard Klein, Zorah Lähner, Hannah Dröge
{"title":"Towards Automated Analysis of Gaze Behavior from Consumer VR Devices for Neurological Diagnosis.","authors":"Lio Schmitz, Markus Plack, Berkan Koyak, Muhammad Ehsan Ullah, Ahmad Aziz, Reinhard Klein, Zorah Lähner, Hannah Dröge","doi":"10.1142/9789819824755_0016","DOIUrl":"10.1142/9789819824755_0016","url":null,"abstract":"<p><p>Recent studies have demonstrated that eye tracking is a valuable tool in the detection, classification and staging of neurodegenerative diseases such as Parkinson's Disease (PD). However, traditional methods for capturing gaze data often rely on expensive and non-engaging clinical equipment such as video-oculography, limiting their accessibility and scalability. In this work, we investigate the feasibility of using eye tracking data collected via consumer-grade virtual reality (VR) headsets to support neurological diagnostics in a more accessible and user-friendly manner.This approach enables large-scale, low-cost, and remote assessments, which are particularly valuable in early detection and monitoring of neurodegenerative conditions. We show that relevant oculomotor features extracted from VR-based eye tracking can be used for predictive assessment. Despite the inherent noise and lower precision of consumer devices, careful preprocessing and robust feature engineering, including deep learning embeddings, mitigate these limitations. Our results demonstrate that both handcrafted and learned features from gaze behavior enable promising levels of classification performance. This research represents an important step towards scalable, automated, and accessible diagnostic tools for neurodegenerative diseases using ubiquitous VR technology.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"219-235"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jason E McDermott, Yana Bromberg, Hannah Carter, Travis Wheeler
{"title":"Session Introduction: Biological molecular function: methods and benchmarks for finding function in biological dark matter.","authors":"Jason E McDermott, Yana Bromberg, Hannah Carter, Travis Wheeler","doi":"10.1142/9789819824755_0029","DOIUrl":"10.1142/9789819824755_0029","url":null,"abstract":"<p><p>The accurate determination of biological molecular function remains one of the most significant challenges in computational biology, with vast areas of biological \"dark matter\" persisting in microbiomes, viruses, and unexplored sequence space. To meet this challenge, we developed at PSB session to address the limitations of traditional sequence similarity-based functional annotation methods and explores how recent advances in AI/ML and high-throughput data generation are transforming the field. We highlight four innovative contributions presented in this session: a geometric framework using signed distance functions for modeling protein surfaces; a reinforcement learning-based approach for steering protein generative models to design functional sequences; an ensemble framework combining sequence, structural, and network features for subcellular localization prediction; and a scalable factorization method integrating gene-gene interaction data for analyzing high-dimensional genetic perturbation profiles. Together, these methodologies showcase the potential for computational and AI-driven tools to address the complex and multiscale nature of molecular function prediction, paving the way for new discoveries in understanding and engineering biological systems.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"417-424"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147309921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating Polygenic Scores with Clinical, Lifestyle, and Social Risk Factors to Improve Heart Failure Risk Prediction.","authors":"Katie M Cardone, Dokyoon Kim, Marylyn D Ritchie","doi":"10.1142/9789819824755_0046","DOIUrl":"10.1142/9789819824755_0046","url":null,"abstract":"<p><p>Heart failure (HF) is highly prevalent, high-burden disorder with its prevalence expected to increase. Early detection of HF can reduce morbidity and mortality; therefore, novel early detection methods are needed. Polygenic scores (PGS) can combine common variants across the genome and provide phenotype-specific risk scores. However, there are also many well-known, non-genomic risk factors of HF, in the clinical, lifestyle, and social determinant of health (SDOH) domains, and it is not clear how genetic and non-genetic risk factors collectively contribute to HF risk. To address this question, we assessed whether combining HF PGS with clinical, lifestyle, and SDOH risk factors improves risk prediction. Leveraging data from the All of Us Research Program (n = 22,275), clinical risk factors were aggregated into a clinical risk score (CRS) while lifestyle and SDOH risk factors were aggregated into a polyexposure score (PXS). Feature selection was conducted with LASSO regression and statistical significance thresholding from logistic regression models (p < 0.05). Features were included in the model if they were statistically significant and important in ≥ 95% of 1000 iterations. To assess model performance, logistic regressions with HF case/control status were conducted with each risk score individually, as well as integrated models. The integrated model (PGS + CRS + PXS) performed better than individual risk scores (AUROC = 0.763, AUPRC = 0.047, F1 score = 0.062, balanced accuracy = 0.683). To assess the validity of the CRS and PXS, an integrated model with the PGS along with clinical and exposure risk factors as independent features was also evaluated. Based on AUPRC and F1 score, this integrated risk model (PGS + CRS risk factors + PXS risk factors) performed better than the combining the PGS with the CRS and PXS (AUROC = 0.738, AUPRC = 0.047, F1 score = 0.066, balanced accuracy = 0.657). These findings demonstrate that integration of risk factors across multiple domains can improve HF prediction. Knowing that PGS combined with clinical, lifestyle, and SDOH risk factors is predictive of HF risk provides greater opportunity for the identification of individuals at risk of HF prior to disease onset with the goal of prevention or early intervention.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"629-643"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12952681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147310790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shreya Johri, Luyang Luo, Hong-Yu Zhou, Todd Brenner, Sami Elamin, Mark Enrik Geissler, Tyler M Berzin, Pranav Rajpurkar
{"title":"A Clinician-Guided Framework for Endoscopic AI: Developing PanEndoAtlas and Benchmarking Foundation Models Across the Full GI Spectrum.","authors":"Shreya Johri, Luyang Luo, Hong-Yu Zhou, Todd Brenner, Sami Elamin, Mark Enrik Geissler, Tyler M Berzin, Pranav Rajpurkar","doi":"10.1142/9789819824755_0004","DOIUrl":"10.1142/9789819824755_0004","url":null,"abstract":"<p><p>Endoscopic procedures play a central role in the diagnosis and management of gastrointestinal (GI) diseases, yet the field lacks large-scale, clinically diverse benchmarks and unified datasets to evaluate vision foundation models. We introduce PanEndoSuite, the first unified ecosystem for endoscopic AI, developed through systematic collaboration between AI researchers and practicing gastroenterologists. PanEndoSuite consists of three complementary components: PanEndoAtlas, PanEndoX, and PanEndoFM. PanEndoAtlas is a harmonized dataset of over 420,000 labeled images from 30 public endoscopy datasets across 13 countries and 26 hospitals, creating a clinically-grounded hierarchical taxonomy that mirrors diagnostic reasoning patterns across 111 GI diseases. PanEndoX is a benchmark of 10 clinically grounded tasks, including hierarchical GI-tree classification, Barrett's esophagus grading, ulcerative colitis scoring, polyp subtyping, Boston Bowel Preparation Scale assessment, multi-organ disease classification, and anatomical landmark identification-designed to probe generalization across anatomical regions, disease presentations, and annotation granularities. PanEndoFM is a foundation model pretrained on a 10 million-image corpus curated from public data sources, spanning the entire GI tract. We benchmark PanEndoFM against two endoscopy-specific foundation models (EndoFM-LV, EndoSSL) and two general-purpose vision models (ViT-B/16, ResNet-50). PanEndoFM achieves the highest macro-AUC on 6 of 10 tasks, demonstrating broad clinical generalization; EndoFM-LV performs best on colon-focused tasks, EndoSSL excels in polyp subtyping, and ViT-B/16 shows strengths on small-intestine conditions. Together, PanEndoSuite establishes a foundation for building robust, generalist AI systems in gastrointestinal endoscopy that bridge current AI capabilities and clinical practice.</p>","PeriodicalId":34954,"journal":{"name":"Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing","volume":"31 ","pages":"42-56"},"PeriodicalIF":0.0,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147311062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}