{"title":"Corrigendum to \"Theory of trust and acceptance of artificial intelligence technology (TrAAIT): An instrument to assess clinician trust and acceptance of artificial intelligence\" [J. Biomed. Inform. 148 (2023) 104550].","authors":"Alexander F Stevens, Pete Stetson","doi":"10.1016/j.jbi.2025.104863","DOIUrl":"https://doi.org/10.1016/j.jbi.2025.104863","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104863"},"PeriodicalIF":4.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144317028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Fernández-Llatas , Begoña Martínez-Salvador , Mar Marcos
{"title":"A declarative approach for interactive process discovery in the clinical domain","authors":"Carlos Fernández-Llatas , Begoña Martínez-Salvador , Mar Marcos","doi":"10.1016/j.jbi.2025.104862","DOIUrl":"10.1016/j.jbi.2025.104862","url":null,"abstract":"<div><h3>Objective:</h3><div>Process Mining (PM) is an established discipline with increasing adoption in the clinical domain. In this context, PM seeks to infer clinical processes from healthcare data collected in the Electronic Health Record. However, the particularities of clinical practice cause that, in most cases, the processes obtained result in an intricate network that hardly corresponds to clinical algorithms and, thus, are difficult to understand for clinical and IT personnel. To address these problems, our aim is to incorporate specialized clinical knowledge into the PM discovery algorithm.</div></div><div><h3>Methods:</h3><div>We propose a declarative approach to interactive process discovery in the clinical domain. Concretely, we present a set of declarative techniques that allows clinicians to incorporate their knowledge in the process, based on the Declare formalism.</div></div><div><h3>Results:</h3><div>The results of this work encompass both the declarative interactive approach and its implementation in the I-PALIA PM discovery algorithm, as well as an application to a use case for the treatment of prostate cancer. This application demonstrates that the implemented techniques are useful in managing typical problems that arise when applying PM methods to the clinical domain.</div></div><div><h3>Conclusion:</h3><div>This work proposes a novel approach with techniques for interactive process discovery in the clinical domain. This approach not only allows the clinical expert to interactively incorporate specialized knowledge into the PM algorithm, but also serves to obtain process models that are more comprehensible and better resemble treatment procedures.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104862"},"PeriodicalIF":4.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhengqiu Yu , Lexin Fang , Yueping Ding , Yan Shen , Lei Xu , Yaozheng Cai , Xiangrong Liu
{"title":"Evaluating large language models for information extraction from gastroscopy and colonoscopy reports through multi-strategy prompting","authors":"Zhengqiu Yu , Lexin Fang , Yueping Ding , Yan Shen , Lei Xu , Yaozheng Cai , Xiangrong Liu","doi":"10.1016/j.jbi.2025.104844","DOIUrl":"10.1016/j.jbi.2025.104844","url":null,"abstract":"<div><h3>Objective:</h3><div>To systematically evaluate large language models (LLMs) for automated information extraction from gastroscopy and colonoscopy reports through prompt engineering, addressing their ability to extract structured information, recognize complex patterns, and support diagnostic reasoning in clinical contexts.</div></div><div><h3>Methods:</h3><div>We developed an evaluation framework incorporating three hierarchical tasks: basic entity extraction, pattern recognition, and diagnostic assessment. The study utilized a dataset of 162 endoscopic reports with structured annotations from clinical experts. Various language models, including proprietary, emerging, and open-source alternatives, were evaluated under both zero-shot and few-shot learning paradigms. For each task, multiple prompting strategies were implemented, including direct prompting and five Chain-of-Thought (CoT) prompting variants.</div></div><div><h3>Results:</h3><div>Larger models with specialized architectures achieved better performance in entity extraction tasks but faced notable challenges in capturing spatial relationships and integrating clinical findings. The effectiveness of few-shot learning varied across models and tasks, with larger models showing more consistent improvement patterns.</div></div><div><h3>Conclusion:</h3><div>These findings provide important insights into the current capabilities and limitations of language models in specialized medical domains, contributing to the development of more effective clinical documentation analysis systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104844"},"PeriodicalIF":4.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144284443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tailoring task arithmetic to address bias in models trained on multi-institutional datasets","authors":"Xiruo Ding , Zhecheng Sheng , Brian Hur , Justin Tauscher , Dror Ben-Zeev , Meliha Yetişgen , Serguei Pakhomov , Trevor Cohen","doi":"10.1016/j.jbi.2025.104858","DOIUrl":"10.1016/j.jbi.2025.104858","url":null,"abstract":"<div><h3>Objective:</h3><div>Multi-institutional datasets are widely used for machine learning from clinical data, to increase dataset size and improve generalization. However, deep learning models in particular may learn to recognize the source of a data element, leading to biased predictions. For example, deep learning models for image recognition trained on chest radiographs with COVID-19 positive and negative examples drawn from different data sources can respond to indicators of provenance (e.g., radiological annotations outside the lung area per institution-specific practices) rather than pathology, generalizing poorly beyond their training data. Bias of this sort, called <em>confounding by provenance</em>, is of concern in natural language processing (NLP) because provenance indicators (e.g., institution-specific section headers, or region-specific dialects) are pervasive in language data. Prior work on addressing such bias has focused on statistical methods, without providing a solution for deep learning models for NLP.</div></div><div><h3>Methods:</h3><div>Recent work in representation learning has shown that representing the weights of a trained deep network as <em>task vectors</em> allows for their arithmetic composition to govern model capabilities towards desired behaviors. In this work, we evaluate the extent to which reducing a model’s ability to distinguish between contributing sites with such task arithmetic can mitigate confounding by provenance. To do so, we propose two model-agnostic methods, Task Arithmetic for Provenance Effect Reduction (TAPER) and Dominance-Aligned Polarized Provenance Effect Reduction (DAPPER), extending the task vectors approach to a novel problem domain.</div></div><div><h3>Results:</h3><div>Evaluation on three datasets shows improved robustness to confounding by provenance for both RoBERTa and Llama-2 models with the task vector approach, with improved performance at the extremes of distribution shift.</div></div><div><h3>Conclusion:</h3><div>This work emphasizes the importance of adjusting for confounding by provenance, especially in extreme cases of the shift. In use of deep learning models, DAPPER and TAPER show efficiency in mitigating such bias. They provide a novel mitigation strategy for confounding by provenance, with broad applicability to address other sources of bias in composite clinical data sets. Source code is available within the DeconDTN toolkit: <span><span>https://github.com/LinguisticAnomalies/DeconDTN-toolkit</span><svg><path></path></svg></span></div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104858"},"PeriodicalIF":4.0,"publicationDate":"2025-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genlang Chen , Sixuan Sui , Jiajian Zhang , Xuan Liu , Ping Cai
{"title":"An attention-based framework for integrating WSI and genomic data in cancer survival prediction","authors":"Genlang Chen , Sixuan Sui , Jiajian Zhang , Xuan Liu , Ping Cai","doi":"10.1016/j.jbi.2025.104836","DOIUrl":"10.1016/j.jbi.2025.104836","url":null,"abstract":"<div><h3>Objective:</h3><div>Cancer survival prediction plays a vital role in enhancing medical decision-making and optimizing patient management. Accurate survival estimation enables healthcare providers to develop personalized treatment plans, improve treatment outcomes, and identify high-risk patients for timely intervention. However, existing methods often rely on single-modality data or suffer from excessive computational complexity, limiting their practical application and the full potential of multimodal integration.</div></div><div><h3>Methods:</h3><div>To address these challenges, we propose a novel multimodal survival prediction framework that integrates Whole Slide Image (WSI) and genomic data. The framework employs attention mechanisms to model intra-modal and inter-modal correlations, effectively capturing complex dependencies within and between modalities. Additionally, locality-sensitive hashing is applied to optimize the self-attention mechanism, significantly reducing computational costs while maintaining predictive performance, enabling the model to handle large-scale or high-resolution WSI datasets efficiently.</div></div><div><h3>Results:</h3><div>Extensive experiments on the TCGA-BLCA dataset validate the effectiveness of the proposed approach. The results demonstrate that integrating WSI and genomic data improves survival prediction accuracy compared to unimodal methods. The optimized self-attention mechanism further enhances model efficiency, allowing for practical implementation on large datasets.</div></div><div><h3>Conclusion:</h3><div>The proposed framework provides a robust and efficient solution for cancer survival prediction by leveraging multimodal data integration and optimized attention mechanisms. This study highlights the importance of multimodal learning in medical applications and offers a promising direction for future advancements in AI-driven clinical decision support systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104836"},"PeriodicalIF":4.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grace Y.E. Kim , Conor K. Corbin , François Grolleau , Michael Baiocchi , Jonathan H. Chen
{"title":"Monitoring strategies for continuous evaluation of deployed clinical prediction models","authors":"Grace Y.E. Kim , Conor K. Corbin , François Grolleau , Michael Baiocchi , Jonathan H. Chen","doi":"10.1016/j.jbi.2025.104854","DOIUrl":"10.1016/j.jbi.2025.104854","url":null,"abstract":"<div><h3>Objective:</h3><div>As machine learning adoption in clinical practice continues to grow, deployed classifiers must be continuously monitored and updated (retrained) to protect against data drift that stems from inevitable changes, including evolving medical practices and shifting patient populations. However, successful clinical machine learning classifiers will lead to a change in care which may change the distribution of features, labels, and their relationship. For example, “high risk” cases that were correctly identified by the model may ultimately get labeled as “low risk” thanks to an intervention prompted by the model’s alert. Classifier surveillance systems naive to such deployment-induced feedback loops will estimate lower model performance and lead to degraded future classifier retrains. The objective of this study is to simulate the impact of these feedback loops, propose feedback aware monitoring strategies as a solution, and assess the performance of these alternative monitoring strategies through simulations.</div></div><div><h3>Methods:</h3><div>We propose Adherence Weighted and Sampling Weighted Monitoring as two feedback loop-aware surveillance strategies. Through simulation we evaluate their ability to accurately appraise post deployment model performance and to initiate safe and accurate classifier retraining.</div></div><div><h3>Results:</h3><div>Measured across accuracy, area under the receiver operating characteristic curve, average precision, brier score, expected calibration error, F1, precision, sensitivity, and specificity, in the presence of feedback loops, Adherence Weighted and Sampling Weighted strategies have the highest fidelity to the ground truth classifier performance while standard approaches yield the most inaccurate estimations. Furthermore, in simulations with true data drift, retraining using standard unweighted approaches results in a AUROC score of 0.52 (drop from 0.72). In contrast, retraining based on Adherence Weighted and Sampling Weighted strategies recover performance to 0.67 which is comparable to what a new model trained from scratch on the existing and shifted data would obtain.</div></div><div><h3>Conclusion:</h3><div>Compared to standard approaches, Adherence Weighted and Sampling Weighted strategies yield more accurate classifier performance estimates, measured according to the no-treatment potential outcome. Retraining based on these strategies bring stronger performance recovery when tested against data drift and feedback loops than do standard approaches.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104854"},"PeriodicalIF":4.0,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ningtao Liu , Shuiping Gou , Ruoxi Gao , Binxiao Su , Wenbo Liu , Claire K.S. Park , Shuwei Xing , Jing Yuan , Aaron Fenster
{"title":"GRU-TV: Time- and Velocity-aware Gated Recurrent Unit for patient representation","authors":"Ningtao Liu , Shuiping Gou , Ruoxi Gao , Binxiao Su , Wenbo Liu , Claire K.S. Park , Shuwei Xing , Jing Yuan , Aaron Fenster","doi":"10.1016/j.jbi.2025.104855","DOIUrl":"10.1016/j.jbi.2025.104855","url":null,"abstract":"<div><h3>Objective:</h3><div>The multivariate clinical temporal series (MCTS) extracted from electronic health records (EHRs) can characterize the dynamic physiological processes. Previous deep patient representation models were proposed to address imputation values and irregular sampling in MCTS. However, the change in physiological status, particularly instantaneous velocity, has not received adequate attention.</div></div><div><h3>Methods:</h3><div>To address this gap, we propose a Time- and Velocity-aware Gated Recurrent Unit (GRU-TV) model for patient representation learning. In the GRU-TV model, we apply the neural ordinary differential equation to describe the instantaneous velocity of the patient’s physiological status. This instantaneous velocity is embedded in the hidden state updating process of the basic GRU model for the awareness of uneven time intervals. Besides, the forward propagation of the GRU-TV model also incorporates this instantaneous velocity to enable the perception of non-uniform changes in the patient’s physiological status over time.</div></div><div><h3>Results:</h3><div>The performance of the GRU-TV model is evaluated on multiple clinical concerns across two real-world datasets. The average AUC for the sub-tasks on the complete, 70% sampled, and 50% sampled PhysioNet2012 datasets are 0.89, 0.84, and 0.83, respectively. The average AUC for the acute care phenotype classification on the complete, 20% sampled, and 10% sampled MIMIC-III datasets are 0.84, 0.82, and 0.80, respectively. The mean absolute deviation of the length-of-stay regression task is 1.84 days.</div></div><div><h3>Conclusion:</h3><div>The superior performance underscores the importance of instantaneous physiological changes in patient representation and clinical decision-making, particularly under challenging data conditions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104855"},"PeriodicalIF":4.0,"publicationDate":"2025-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuxi Liu , Zhenhao Zhang , Jiacong Mi , Shirui Pan , Tianlong Chen , Yi Guo , Xing He , Jiang Bian
{"title":"GatorCLR: Personalized predictions of patient outcomes on electronic health records using self-supervised contrastive graph representation","authors":"Yuxi Liu , Zhenhao Zhang , Jiacong Mi , Shirui Pan , Tianlong Chen , Yi Guo , Xing He , Jiang Bian","doi":"10.1016/j.jbi.2025.104851","DOIUrl":"10.1016/j.jbi.2025.104851","url":null,"abstract":"<div><h3>Objective:</h3><div>Recently, there has been growing interest in analyzing large amounts of Electronic Health Record (EHR) data. Patient outcome prediction is a major area of interest in EHR analysis that focuses on predicting the future health status of patients using structured data types, such as diagnoses, medications, and procedures collected from longitudinal EHR data. We investigate and design self-supervised learning (SSL) paradigms to learn high-quality representations from longitudinal EHR data, aiming to effectively capture longitudinal relationships and patterns for improved patient outcome predictions.</div></div><div><h3>Methods:</h3><div>We propose an end-to-end, novel, and robust model called GatorCLR that aligns with the contrastive SSL paradigm. GatorCLR incorporates graph analysis-based patient modeling into longitudinal EHR data, generating graph representations of nodes and edges representing patients, their relationships, and similarities. A two-layer augmentation technique is further incorporated in our GatorCLR that generates consistent, identity-preserving augmentations from graph representations.</div></div><div><h3>Results:</h3><div>We evaluate our approach using real-world EHR datasets. Experimental results indicate that our GatorCLR delivers meaningful and robust performance across multiple clinical tasks and datasets and provides transparency of the model decisions.</div></div><div><h3>Conclusion:</h3><div>The proposed approach presents a significant step toward developing a foundation model with longitudinal EHR data, capable of making informed predictions and adaptable to various downstream use cases and tasks. This study should, therefore, be of value to practitioners wishing to leverage longitudinal EHR data for predictive analytics.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104851"},"PeriodicalIF":4.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144212009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-view based heterogeneous graph contrastive learning for drug–target interaction prediction","authors":"Chao Li , Lichao Zhang , Guoyi Sun , Lingtao Su","doi":"10.1016/j.jbi.2025.104852","DOIUrl":"10.1016/j.jbi.2025.104852","url":null,"abstract":"<div><div>Drug–Target Interaction (DTI) prediction plays a pivotal role in accelerating drug discovery and development by identifying novel interactions between drugs and targets. Most previous studies on Drug–Protein Pair (DPP) networks have primarily focused on learning their topological structures. However, two key challenges remain: the integration of topological and semantic information is often insufficient, and the representation diversity may be diminished during graph convolution operations, affecting the expressiveness of learned features. To address the above challenges, we propose a novel paradigm named Multi-view Based Heterogeneous Graph Contrastive Learning for Drug–Target Interaction Prediction (HGCML-DTI). Specifically, we initially establish a drug–protein heterogeneous graph, followed by employing a weighted Graph Convolutional Network (GCN) to derive vector representations for both drug and protein nodes. Subsequently, we individually construct the topology and semantic graphs for DPP and integrate them to form a unified public graph. A multi-channel graph neural network is employed to learn DPP representations. To preserve representation diversity and enhance discriminative ability, a multi-view contrastive learning strategy is introduced. Then, a Multilayer Perceptron (MLP) neural network is used to recognize DTI. To prove the effectiveness of this work, extensive experiments are conducted on six real-world datasets, and comparisons are made with seven competitive baselines. The results demonstrate that the proposed HGCML-DTI significantly outperforms state-of-the-art methods. This work highlights the importance of combining multi-view learning and contrastive strategies to advance the field of DTI prediction. Source codes are available at <span><span>https://github.com/7A13/HGCML-DTI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104852"},"PeriodicalIF":4.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ziqi Guo , Jack Felag , Jordan C. Rozum , Rion Brattig Correia , Xuan Wang , Luis M. Rocha
{"title":"Focused digital cohort selection from social media using the metric backbone of biomedical knowledge graphs","authors":"Ziqi Guo , Jack Felag , Jordan C. Rozum , Rion Brattig Correia , Xuan Wang , Luis M. Rocha","doi":"10.1016/j.jbi.2025.104847","DOIUrl":"10.1016/j.jbi.2025.104847","url":null,"abstract":"<div><div>Social media data allows researchers to construct large <em>digital cohorts</em> — groups of users who post health-related content — to study the interplay between human behavior and medical treatment. Identifying the users most relevant to a specific health problem is, however, a challenge in that social media sites vary in the generality of their discourse. While X (formerly Twitter), Instagram, and Facebook cater to wide ranging topics, Reddit subgroups and dedicated patient advocacy forums trade in much more specific, biomedically-relevant discourse.</div><div>To filter relevant users on any social media, we have developed a general method and tested it on epilepsy discourse. We analyzed the text from posts by users who mention epilepsy drugs at least once in the general-purpose social media sites X and Instagram, the epilepsy-focused Reddit subgroup (r/Epilepsy), and the Epilepsy Foundation of America (EFA) forums. We used a curated medical terminology dictionary to generate a knowledge graph (KG) from each social media site, whereby nodes represent terms, and edge weights denote the strength of association between pairs of terms in the collected text.</div><div>Our method is based on computing the metric backbone of each KG, which yields the (sparsified) subgraph of edges that participate in shortest paths. By comparing the subset of users who contribute to the backbone to the subset who do not, we show that epilepsy-focused social media users contribute to the KG backbone in much higher proportion than do general-purpose social media users. Furthermore, using human annotation of Instagram posts, we demonstrate that users who do not contribute to the backbone are much more likely to use dictionary terms in a manner inconsistent with their biomedical meaning and are rightly excluded from the cohort of interest.</div><div>Our metric backbone approach, thus, has several benefits: it yields focused user cohorts who engage in discourse relevant to a targeted biomedical problem; unlike engagement-based approaches, it can retain low-engagement users who nonetheless contribute meaningful biomedical insights and filter out very vocal users who contribute no relevant content, it is parameter-free, algebraically principled, does not require classifiers or human-curation, and is simple to compute with the open-source code we provide.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"168 ","pages":"Article 104847"},"PeriodicalIF":4.0,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144215947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}