Journal of Biomedical Informatics最新文献

筛选
英文 中文
Interpretable deep neural networks for advancing early neonatal birth weight prediction using multimodal maternal factors 利用多模态母体因素推进新生儿早期出生体重预测的可解释深度神经网络
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-06 DOI: 10.1016/j.jbi.2025.104838
Muhammad Mursil , Hatem A. Rashwan , Adnan Khalid , Pere Cavallé-Busquets , Luis Santos-Calderon , Michelle M. Murphy , Domenec Puig
{"title":"Interpretable deep neural networks for advancing early neonatal birth weight prediction using multimodal maternal factors","authors":"Muhammad Mursil ,&nbsp;Hatem A. Rashwan ,&nbsp;Adnan Khalid ,&nbsp;Pere Cavallé-Busquets ,&nbsp;Luis Santos-Calderon ,&nbsp;Michelle M. Murphy ,&nbsp;Domenec Puig","doi":"10.1016/j.jbi.2025.104838","DOIUrl":"10.1016/j.jbi.2025.104838","url":null,"abstract":"<div><h3>Background:</h3><div>Neonatal low birth weight (LBW) is a significant predictor of increased morbidity and mortality among newborns. Predominantly, traditional prediction methods depend heavily on ultrasonography, which does not consider risk factors affecting birth weight (BW).</div></div><div><h3>Objective:</h3><div>This study introduces a robust deep neural network for a clinical decision-support system designed to early predict neonatal BW, using data available during early pregnancy, with enhanced precision. This innovative system incorporates a comprehensive array of maternal factors, placing particular emphasis on nutritional elements alongside physiological and lifestyle variables.</div></div><div><h3>Methods:</h3><div>We employed and validated various traditional machine learning models as well as an interpretable deep learning model using the TabNet architecture, noted for its proficient handling of tabular data and high level of interpretability. The efficacy of these models was evaluated against extensive datasets that encompass a broad spectrum of maternal health indicators.</div></div><div><h3>Results:</h3><div>The TabNet model exhibited outstanding predictive capabilities, achieving an accuracy of 96% and an area under the curve (AUC) of 0.96. Significantly, maternal vitamin B12 and folate status emerged as pivotal predictors of BW, emphasizing the crucial role of nutritional factors in influencing neonatal health outcomes.</div></div><div><h3>Conclusions:</h3><div>Our results demonstrate the substantial benefits of integrating multimodal maternal factors into predictive models for neonatal BW, markedly enhancing the precision over traditional AI methods. The developed decision-support system not only has a possible application in prenatal care but also provides actionable insights that can be leveraged to mitigate the risks associated with LBW, thereby improving clinical decision-making processes and outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104838"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion architectures for Alzheimer’s disease diagnosis: An experimental study 用于阿尔茨海默病诊断的多模态融合架构:一项实验研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-06 DOI: 10.1016/j.jbi.2025.104834
Florence Leony , Chen-ju Lin , Alzheimer’s Disease Neuroimaging Initiative
{"title":"Multimodal fusion architectures for Alzheimer’s disease diagnosis: An experimental study","authors":"Florence Leony ,&nbsp;Chen-ju Lin ,&nbsp;Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.jbi.2025.104834","DOIUrl":"10.1016/j.jbi.2025.104834","url":null,"abstract":"<div><h3>Objective:</h3><div>In the attempt of early diagnosis of Alzheimer’s Disease, varying forms of medical records of multiple modalities are gathered to seize the interaction of multiple factors. However, the heterogeneity of multimodal data brings a challenge. Hence, the role of artificial intelligence comes into play to provide the medical practitioner assistance in making diagnosis and prognosis. In order to be adopted as a clinical decision support system, interpretable or explainable model is important for healthcare professionals to trust the results. This study assessed various popular machine learning models under two multimodal fusion architectures to find the best combination in terms of both predictive performance and interpretability.</div></div><div><h3>Methods:</h3><div>Two architectures, early and late, also known as feature- and decision-level fusion were chosen for multinomial classification task. On top of the commonly used simple concatenation, this study employed weighted and hybrid weighted concatenation to fuse features within and across modalities under the two fusion structures. To test the efficacy of each model pipeline, the assessment was done according to their distinct foundations on which the models were built and each of their advantages was recognized. Classification metrics were unified and visualized into a pentagon to compare the overall performance of each pipeline. In addition, interpretability analysis was provided to quantify the importance of each modality and feature recognized by each model.</div></div><div><h3>Results:</h3><div>The potential characteristics of each type of pipelines in terms of prediction accuracy and ability to capture the relevant markers of each cognitive state were uncovered. In this particular healthcare application, the tree-based and linear models were the top 2 choices. Coupled with early and late fusion structure with weighted concatenation, reaching the balanced accuracy of 0.920 and 0.912, consecutively. The top 5 most important features revealed belong to Cognitive Test Scores and Neuropsychological Battery of Test modalities.</div></div><div><h3>Conclusion:</h3><div>This work contributes as medical applications of artificial intelligence evaluation to aid practitioners in understanding the capability of different fusion architectures with different classifiers in getting to know the use of machine learning in clinical setting. With accurate classification, early detection of Mild Cognitive Impairment and Alzheimer’s Disease can be achieved.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104834"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based framework for temporal health event prediction with graph-enhanced representations 一个基于转换器的框架,用于具有图形增强表示的时间健康事件预测
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-03 DOI: 10.1016/j.jbi.2025.104826
Tianci Liu , Lizhong Liang , Chao Che , Yunjiong Liu , Bo Jin
{"title":"A transformer-based framework for temporal health event prediction with graph-enhanced representations","authors":"Tianci Liu ,&nbsp;Lizhong Liang ,&nbsp;Chao Che ,&nbsp;Yunjiong Liu ,&nbsp;Bo Jin","doi":"10.1016/j.jbi.2025.104826","DOIUrl":"10.1016/j.jbi.2025.104826","url":null,"abstract":"<div><h3>Objective:</h3><div>Deep learning approaches have demonstrated significant potential in predicting temporal health events in recent years. However, existing methods have not fully leveraged the complex interactions among comorbidities and have overlooked imbalances and temporal irregularities in admission records.</div></div><div><h3>Methods:</h3><div>This study proposes GLT-Net, a deep learning approach that combines <u>G</u>raph <u>L</u>earning with <u>T</u>ransformer framework to tackle these challenges. GLT-Net begins by constructing a patient association graph to generate unique representations for each individual. At the same time, the hierarchical structure of diagnosis codes is utilized to pre-train the diagnosis code embeddings. Subsequently, a comorbidity association matrix is created to illustrate the relationships between comorbidities, and graph neural networks are employed to enhance the feature representations of diagnosis codes. Finally, a Transformer-Encoder framework captures the dependencies in historical admission records by incorporating time information.</div></div><div><h3>Results:</h3><div>We demonstrate our approach on two tasks in temporal health event predcition. Experimental results on real-world datasets show that GLT-Net outperforms baseline models in forecasting temporal health events. Additionally, a case study demonstrates the effectiveness of GLT-Net in predicting health events.</div></div><div><h3>Conclusion:</h3><div>Understanding progression patterns over time, comorbidity associations, and patient characterization is essential for predicting temporal health events. Our study provides new insights and methods for a deeper understanding of patient health status and disease trends. Moreover, our model can be extended to other data sources, enhancing its versatility.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104826"},"PeriodicalIF":4.0,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioinformatic challenges in metagenomic next generation sequencing data analysis while unravelling a case of uncommon campylobacteriosis 新一代宏基因组测序数据分析中的生物信息学挑战,同时揭示了一例罕见的弯曲菌病
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-02 DOI: 10.1016/j.jbi.2025.104841
Rok Kogoj , Martin Bosilj , Andraž Celar Šturm , Misa Korva , Katja Strašek Smrdel , Eva Kvas , Mateja Pirš , Lidija Lepen , Tina Triglav
{"title":"Bioinformatic challenges in metagenomic next generation sequencing data analysis while unravelling a case of uncommon campylobacteriosis","authors":"Rok Kogoj ,&nbsp;Martin Bosilj ,&nbsp;Andraž Celar Šturm ,&nbsp;Misa Korva ,&nbsp;Katja Strašek Smrdel ,&nbsp;Eva Kvas ,&nbsp;Mateja Pirš ,&nbsp;Lidija Lepen ,&nbsp;Tina Triglav","doi":"10.1016/j.jbi.2025.104841","DOIUrl":"10.1016/j.jbi.2025.104841","url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to employ advanced bioinformatics and modern sequencing approaches to solve a diagnostic problem of persistent <em>Campylobacter</em> spp. molecular detection yet negative culture results from four consecutive stool samples of a previously healthy patient with newly diagnosed selective IgA deficiency and prolonged diarrhoea.</div></div><div><h3>Methods</h3><div>Metagenomic next-generation sequencing (mNGS) based on short-paired end reads with basic bioinformatic read classification analysis was used at first. Due to ambiguous results, advanced bioinformatics involving contigs construction and classification, reference genome mappings and reads filtering with BBSplit, additionally coupled with metagenomic long-reads sequencing and Full-length 16S rRNA metabarcoding were employed to further elucidate the results. Virulence factors were analysed using the Prokka Genome Annotation tool. Modified classical bacteriology methods were finally used for further clarification.</div></div><div><h3>Results</h3><div>Short-pair end reads analysis identified several <em>Campylobacter</em> species in all four samples. After advanced bioinformatic approaches were applied, candidatus <em>C. infans</em> was suspected as the putative pathogen. This result was further supported by metagenomic long-reads sequencing and Full-length 16S rRNA metabarcoding. Nevertheless, after modifying the culture conditions based on mNGS results, a mixed culture of candidatus <em>C. infans</em> and <em>C.<!--> <!-->ureolyticus</em> was obtained. Sequencing of the mixed culture resulted in an 87.48% and 73.47% genome coverage of candidatus <em>C. infans</em> and <em>C. ureolyticus</em>, respectively. In the candidatus <em>C. infans</em> genome more virulence factors hits were found than in the <em>C. ureolyticus</em> genome thus supporting the first as the most probable cause of symptoms.</div></div><div><h3>Conclusion</h3><div>This study shows the pivotal role and strengths of mNGS in unravelling an unusual case of diarrhoea and demonstrates how mNGS can guide established microbiological methods to improve on current limitations. However, it also emphasises the need for careful interpretation of sequencing data, particularly for closely related bacterial species from clinical samples that are known to support complex microbial communities.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104841"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records CD-Tron:利用大型临床语言模型从电子健康记录中早期检测认知能力下降
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-02 DOI: 10.1016/j.jbi.2025.104830
Hao Guan , John Novoa-Laurentiev , Li Zhou
{"title":"CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records","authors":"Hao Guan ,&nbsp;John Novoa-Laurentiev ,&nbsp;Li Zhou","doi":"10.1016/j.jbi.2025.104830","DOIUrl":"10.1016/j.jbi.2025.104830","url":null,"abstract":"<div><h3>Background:</h3><div>Early detection of cognitive decline during the preclinical stage of Alzheimer’s disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.</div></div><div><h3>Methods:</h3><div>We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model’s predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model’s prediction.</div></div><div><h3>Results:</h3><div>CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.</div></div><div><h3>Conclusion:</h3><div>CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104830"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies 利用图表回顾表型中的未确定病例来加强基于ehr的关联研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-30 DOI: 10.1016/j.jbi.2025.104839
Xinyao Jian , Dazheng Zhang , Zehao Yu , Hua Xu , Jiang Bian , Yonghui Wu , Jiayi Tong , Yong Chen
{"title":"Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies","authors":"Xinyao Jian ,&nbsp;Dazheng Zhang ,&nbsp;Zehao Yu ,&nbsp;Hua Xu ,&nbsp;Jiang Bian ,&nbsp;Yonghui Wu ,&nbsp;Jiayi Tong ,&nbsp;Yong Chen","doi":"10.1016/j.jbi.2025.104839","DOIUrl":"10.1016/j.jbi.2025.104839","url":null,"abstract":"<div><h3>Objectives</h3><div>In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third “undecided” category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.</div></div><div><h3>Material and Methods</h3><div>We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer’s disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.</div></div><div><h3>Results</h3><div>Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.</div></div><div><h3>Discussion</h3><div>Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.</div></div><div><h3>Conclusion</h3><div>The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104839"},"PeriodicalIF":4.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PregAN-NET: Addressing Class Imbalance with GANs in Interpretable Computational Framework for Predicting Safety Profile of Drugs Considering Adverse Reactions During Pregnancy PregAN-NET:在可解释的计算框架中解决gan的类不平衡,以预测怀孕期间考虑不良反应的药物安全性
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-28 DOI: 10.1016/j.jbi.2025.104832
Anushka Chaurasia , Deepak Kumar , Yogita
{"title":"PregAN-NET: Addressing Class Imbalance with GANs in Interpretable Computational Framework for Predicting Safety Profile of Drugs Considering Adverse Reactions During Pregnancy","authors":"Anushka Chaurasia ,&nbsp;Deepak Kumar ,&nbsp;Yogita","doi":"10.1016/j.jbi.2025.104832","DOIUrl":"10.1016/j.jbi.2025.104832","url":null,"abstract":"<div><div>Adverse Drug Reactions (ADRs) during pregnancy pose significant risks to both the mother and the fetus. Conventional approaches to predict ADR are inadequate due to ethical restrictions that prevent performing medication studies in pregnant women, leading to restricted data samples. Hence, computational techniques have been promising for ADR predictions. However, most of these techniques have focused on the general population and face the challenge of class imbalance and lack of model interpretability. In the present work, an ensemble learning-based PregAN-NET framework has been proposed that addresses the issue of class imbalance by generating synthetic data employing Conditional Tabular Generative Adversarial Network (CTGAN) and integrates neural network and gradient boosting as a Boosted Neural Ensemble (BNE) architecture to predict safe and unsafe drugs considering their adverse reactions during pregnancy. Furthermore, the SHAP method has been employed to enhance the post-hoc interpretability of the BNE architecture by analyzing the contribution of different features towards prediction. The proposed framework has been applied to chemical and biological properties from PubChem and DrugBank, along with class labels from the ADReCS database. CTGAN has been evaluated for data balancing, showing a 2% to 5% performance improvement over SMOTE. The BNE architecture has outperformed six state-of-the-art methods by achieving mean ROC-AUC scores between 77.00% and 90.00% for chemical data, 66.00% and 74.00% for biological data, and 70.00% to 75.00% for combined datasets. Further, the top 20 contributory features in prediction corresponding to the different drug properties have been identified.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104832"},"PeriodicalIF":4.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143891351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised discovery of clinical disease signatures using probabilistic independence 使用概率独立性的临床疾病特征的无监督发现
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-23 DOI: 10.1016/j.jbi.2025.104837
Thomas A. Lasko , William W. Stead , John M. Still , Thomas Z. Li , Michael Kammer , Marco Barbero-Mota , Eric V. Strobl , Bennett A. Landman , Fabien Maldonado
{"title":"Unsupervised discovery of clinical disease signatures using probabilistic independence","authors":"Thomas A. Lasko ,&nbsp;William W. Stead ,&nbsp;John M. Still ,&nbsp;Thomas Z. Li ,&nbsp;Michael Kammer ,&nbsp;Marco Barbero-Mota ,&nbsp;Eric V. Strobl ,&nbsp;Bennett A. Landman ,&nbsp;Fabien Maldonado","doi":"10.1016/j.jbi.2025.104837","DOIUrl":"10.1016/j.jbi.2025.104837","url":null,"abstract":"<div><h3>Objective</h3><div>This study uses probabilistic independence to disentangle patient-specific sources of disease and their signatures in Electronic Health Record (EHR) data.</div></div><div><h3>Materials and Methods</h3><div>We model a disease source as an unobserved root node in the causal graph of observed EHR variables (laboratory test results, medication exposures, billing codes, and demographics), and a signature as the set of downstream effects that a given source has on those observed variables. We used probabilistic independence to infer 2000 sources and their signatures from 9195 variables in <span><math><mrow><mn>630</mn><mo>,</mo><mn>000</mn></mrow></math></span> cross-sectional training instances sampled at random times from 269,099 longitudinal patient records. We evaluated the learned sources by using them to infer and explain the causes of benign vs. malignant pulmonary nodules in 13,252 records, comparing the inferred causes to an external reference list and other medical literature. We compared models trained by three different algorithms and used corresponding models trained directly from the observed variables as baselines.</div></div><div><h3>Results</h3><div>The model recovered 92% of malignant and 30% of benign causes in the reference standard. Of the top 20 inferred causes of malignancy, 14 were not listed in the reference standard, but had supporting evidence in the literature, as did 11 of the top 20 inferred causes of benign nodules. The model decomposed listed malignant causes by an average factor of 5.5 and benign causes by 4.1, with most stratifying by disease course or treatment regimen. Predictive accuracy of causal predictive models trained on source expressions (Random Forest AUC 0.788) was similar to (p = 0.058) their associational baselines (0.738).</div></div><div><h3>Discussion</h3><div>Most of the unrecovered causes were due to the rarity of the condition or lack of sufficient detail in the input data. Surprisingly, the causal model found many patients with apparently undiagnosed cancer as the source of the malignant nodules. Causal model AUC also suggests that some sources remained undiscovered in this cohort.</div></div><div><h3>Conclusion</h3><div>These promising results demonstrate the potential of using probabilistic independence to disentangle complex clinical signatures from noisy, asynchronous, and incomplete EHR data that represent the confluence of multiple simultaneous conditions, and to identify patient-specific causes that support precise treatment decisions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104837"},"PeriodicalIF":4.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143894732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study 利用自然语言处理来阐明现实世界的临床决策范例:一项概念证明研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104829
Yaniv Alon , Etti Naimi , Chedva Levin , Hila Videl , Mor Saban
{"title":"Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study","authors":"Yaniv Alon ,&nbsp;Etti Naimi ,&nbsp;Chedva Levin ,&nbsp;Hila Videl ,&nbsp;Mor Saban","doi":"10.1016/j.jbi.2025.104829","DOIUrl":"10.1016/j.jbi.2025.104829","url":null,"abstract":"<div><h3>Background</h3><div>Understanding how clinicians arrive at decisions in actual practice settings is vital for advancing personalized, evidence-based care. However, systematic analysis of qualitative decision data poses challenges.</div></div><div><h3>Methods</h3><div>We analyzed transcribed interviews with Hebrew-speaking clinicians on decision processes using natural language processing (NLP). Word frequency and characterized terminology use, while large language models (ChatGPT from OpenAI and Gemini by Google) identified potential cognitive paradigms.</div></div><div><h3>Results</h3><div>Word frequency analysis of clinician interviews identified experience and knowledge as most influential on decision-making. NLP tentatively recognized heuristics-based reasoning grounded in past cases and intuition as dominant cognitive paradigms. Elements of shared decision-making through individualizing care with patients and families were also observed. Limited Hebrew clinical language resources required developing preliminary lexicons and dynamically adjusting stopwords. Findings also provided preliminary support for heuristics guiding clinical judgment while highlighting needs for broader sampling and enhanced analytical frameworks.</div></div><div><h3>Conclusions</h3><div>This study represents the first use of integrated qualitative and computational methods to systematically elucidate clinical decision-making. Findings supported experience-based heuristics guiding cognition. With methodological enhancements, similar analyses could transform global understanding of tailored care delivery. Standardizing interdisciplinary collaborations on developing NLP tools and analytical frameworks may advance equitable, evidence-based healthcare by elucidating real-world clinical reasoning processes across diverse populations and settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104829"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference for time series datasets with partially overlapping variables 部分重叠变量时间序列数据集的因果推理
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104828
Louis Adedapo Gomez , Jan Claassen , Samantha Kleinberg
{"title":"Causal inference for time series datasets with partially overlapping variables","authors":"Louis Adedapo Gomez ,&nbsp;Jan Claassen ,&nbsp;Samantha Kleinberg","doi":"10.1016/j.jbi.2025.104828","DOIUrl":"10.1016/j.jbi.2025.104828","url":null,"abstract":"<div><h3>Objective:</h3><div>Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient’s health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another.</div></div><div><h3>Methods:</h3><div>We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit.</div></div><div><h3>Results:</h3><div>On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event.</div></div><div><h3>Conclusion:</h3><div>Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104828"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信