Journal of Biomedical Informatics最新文献

筛选
英文 中文
A lightweight graph neural network to predict long-term mortality in coronary artery disease patients: an interpretable causality-aware approach 预测冠心病患者长期死亡率的轻量级图神经网络:一种可解释的因果关系感知方法。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-11 DOI: 10.1016/j.jbi.2025.104846
Mohammad Yaseliani , Md. Noor-E-Alam , Osama Dasa , Xiaochen Xian , Carl J. Pepine , Md Mahmudul Hasan
{"title":"A lightweight graph neural network to predict long-term mortality in coronary artery disease patients: an interpretable causality-aware approach","authors":"Mohammad Yaseliani ,&nbsp;Md. Noor-E-Alam ,&nbsp;Osama Dasa ,&nbsp;Xiaochen Xian ,&nbsp;Carl J. Pepine ,&nbsp;Md Mahmudul Hasan","doi":"10.1016/j.jbi.2025.104846","DOIUrl":"10.1016/j.jbi.2025.104846","url":null,"abstract":"<div><h3>Background</h3><div>Coronary artery disease (CAD) causes substantial death toll in the United States and worldwide. While traditional methods for CAD mortality prediction are based on established risk factors, they have significant limitations in accuracy, adaptability to diverse populations, performance for individual risk prediction compared to group data, and incorporation of socioeconomic and lifestyle variations. Machine learning (ML) models have demonstrated superior performance in CAD prediction; however, they often struggle with capturing complex data interactions that can impact mortality.</div></div><div><h3>Methods</h3><div>We proposed lightweight, interpretable graph neural network (GNN) models, utilizing data from a large trial of hypertensive patients with CAD to predict mortality using a concise set of critical features. While this smaller set of features can improve efficiency and implementation in clinical settings, the model’s “lightweight” nature facilitates fast real-time applications. We utilized a hybrid approach, which first uses logistic regression (LR) to identify statistically significant features, followed by propensity score matching (PSM) to identify potentially causal features. These causal features, alongside demographic variables, were employed to create a graph of patients, drawing edges between patients with similar causal features. Accordingly, lightweight 5-layer graph convolutional network (GCN) and graph attention network (GAT) were designed for mortality prediction, followed by an interpretable method (i.e., GNNExplainer) to report the feature importance.</div></div><div><h3>Results</h3><div>The proposed GCN achieved a recall of 93.02% and a negative predictive value (NPV) of 89.42%, higher than all other classifiers. Accordingly, a web-based decision support system (DSS), called CAD-SS, was developed, capable of predicting mortality and identifying risk factors and similar patients, guiding clinicians in reliable and informed decision-making.</div></div><div><h3>Conclusions</h3><div>Our proposed CAD-SS, which utilizes an interpretable and causality-aware lightweight GCN model, demonstrated reasonably high performance in predicting mortality due to CAD. This unique system can help identify the most vulnerable patients.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"167 ","pages":"Article 104846"},"PeriodicalIF":4.0,"publicationDate":"2025-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144016306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating an information theoretic approach for selecting multimodal data fusion methods 评价选择多模态数据融合方法的信息理论方法。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-10 DOI: 10.1016/j.jbi.2025.104833
Tengyue Zhang , Ruiwen Ding , Kha-Dinh Luong , William Hsu
{"title":"Evaluating an information theoretic approach for selecting multimodal data fusion methods","authors":"Tengyue Zhang ,&nbsp;Ruiwen Ding ,&nbsp;Kha-Dinh Luong ,&nbsp;William Hsu","doi":"10.1016/j.jbi.2025.104833","DOIUrl":"10.1016/j.jbi.2025.104833","url":null,"abstract":"<div><h3>Objective:</h3><div>Interest has grown in combining radiology, pathology, genomic, and clinical data to improve the accuracy of diagnostic and prognostic predictions toward precision health. However, most existing works choose their datasets and modeling approaches empirically and in an ad hoc manner. A prior study proposed four partial information decomposition (PID)-based metrics to provide a theoretical understanding of multimodal data interactions: redundancy, uniqueness of each modality, and synergy. However, these metrics have only been evaluated in a limited collection of biomedical data, and the existing work does not elucidate the effect of parameter selection when calculating the PID metrics. In this work, we evaluate PID metrics on a wider range of biomedical data, including clinical, radiology, pathology, and genomic data, and propose potential improvements to the PID metrics.</div></div><div><h3>Methods:</h3><div>We apply the PID metrics to seven different modality pairs across four distinct cohorts (datasets). We compare and interpret trends in the resulting PID metrics and downstream model performance in these multimodal cohorts. The downstream tasks being evaluated include predicting the prognosis (either overall survival or recurrence) of patients with non-small cell lung cancer, prostate cancer, and glioblastoma.</div></div><div><h3>Results:</h3><div>We found that, while PID metrics are informative, solely relying on these metrics to decide on a fusion approach does not always yield a machine learning model with optimal performance. Of the seven different modality pairs, three had poor (0%), three had moderate (66%–89%), and only one had perfect (100%) consistency between the PID values and model performance. We propose two improvements to the PID metrics (determining the optimal parameters and uncertainty estimation) and identified areas where PID metrics could be further improved.</div></div><div><h3>Conclusion:</h3><div>The current PID metrics are not accurate enough for estimating the multimodal data interactions and need to be improved before they can serve as a reliable tool. We propose improvements and provide suggestions for future work. Code: <span><span>https://github.com/zhtyolivia/pid-multimodal</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"167 ","pages":"Article 104833"},"PeriodicalIF":4.0,"publicationDate":"2025-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144020795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge-enhanced Parameter-efficient Transfer Learning with METER for medical vision-language tasks 基于METER的医学视觉语言任务的知识增强参数高效迁移学习
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-08 DOI: 10.1016/j.jbi.2025.104840
Xudong Liang , Jiang Xie , Jinzhu Wei , Mengfei Zhang , Haoyang Zhang
{"title":"Knowledge-enhanced Parameter-efficient Transfer Learning with METER for medical vision-language tasks","authors":"Xudong Liang ,&nbsp;Jiang Xie ,&nbsp;Jinzhu Wei ,&nbsp;Mengfei Zhang ,&nbsp;Haoyang Zhang","doi":"10.1016/j.jbi.2025.104840","DOIUrl":"10.1016/j.jbi.2025.104840","url":null,"abstract":"<div><h3>Objective:</h3><div>The full fine-tuning paradigm becomes impractical when applying pre-trained models to downstream tasks due to significant computational and storage costs. Parameter-efficient fine-tuning (PEFT) methods can alleviate the issue. However, solely applying PEFT methods leads to sub-optimal performance owing to the domain gap between pre-trained models and medical downstream tasks.</div></div><div><h3>Methods:</h3><div>This study proposes <u>K</u>nowledge-enhanced <u>P</u>arameter-efficient Transfer <u>L</u>earning with <u>METER</u> (KPL-METER) for medical vision-language (VL) downstream tasks. KPL-METER combines PEFT methods, including an innovative PEFT module for multi-modal branches and newly introduced external domain-specific knowledge to enhance model performance. First, a lightweight, plug-and-play module named Sharing Adapter (SAdapter) is developed and inserted into the multi-modal encoders. This allows the two modalities to maintain uni-modal features while encouraging cross-modal consistency. Second, a novel knowledge extraction method and a parameter-free knowledge modeling strategy are developed to incorporate domain-specific knowledge from the Unified Medical Language System (UMLS) into multi-modal features. To further enhance the modeling of uni-modal features, Adapter is added to the image and text encoders.</div></div><div><h3>Results:</h3><div>The effectiveness of the proposed model is evaluated on two medical VL tasks using three VL datasets. The results indicate that the KPL-METER model outperforms other PEFT methods in terms of performance while utilizing fewer parameters. Furthermore, KPL-METER-MED, which incorporates medical-tailored encoders, is developed. Compared to previous models in the medical domain, KPL-METER-MED tunes fewer parameters while generally achieving higher performance.</div></div><div><h3>Conclusion:</h3><div>The proposed KPL-METER architecture effectively adapts general VL models for medical VL tasks, and the designed knowledge extraction and fusion method notably enhance performance by integrating medical domain-specific knowledge. Code is available at <span><span>https://github.com/Adam-lxd/KPL-METER</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104840"},"PeriodicalIF":4.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143947056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Implementation of a Digital Maturity Framework for Biobanking 实施生物银行的数字成熟度框架。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-07 DOI: 10.1016/j.jbi.2025.104842
Federica Rossi , Davide Fragnito , Antonella Cruoglio , Ramona Palombo , Alice Massacci , Alessandro Sulis , Vittorio Meloni , Sara Casati , Antonella Mirabile , Andrea Manconi , Luciano Milanesi , Gennaro Ciliberto , Monica Forni , Valentina Adami , Massimiliano Borsani , Claudia Miele , Marialuisa Lavitrano , Matteo Pallocca
{"title":"Implementation of a Digital Maturity Framework for Biobanking","authors":"Federica Rossi ,&nbsp;Davide Fragnito ,&nbsp;Antonella Cruoglio ,&nbsp;Ramona Palombo ,&nbsp;Alice Massacci ,&nbsp;Alessandro Sulis ,&nbsp;Vittorio Meloni ,&nbsp;Sara Casati ,&nbsp;Antonella Mirabile ,&nbsp;Andrea Manconi ,&nbsp;Luciano Milanesi ,&nbsp;Gennaro Ciliberto ,&nbsp;Monica Forni ,&nbsp;Valentina Adami ,&nbsp;Massimiliano Borsani ,&nbsp;Claudia Miele ,&nbsp;Marialuisa Lavitrano ,&nbsp;Matteo Pallocca","doi":"10.1016/j.jbi.2025.104842","DOIUrl":"10.1016/j.jbi.2025.104842","url":null,"abstract":"<div><h3>Objective</h3><div>Digitalization is a pillar of reproducible research and a mandatory requirement for Research Infrastructures. Biobanks must ensure a fully engineered and digitalized process towards data FAIRification. To this aim, the first step is to assess the current level of digitalization using quantitative metrics, which is particularly challenging given the multi-faceted regulatory and logistical nature of biobanking.</div></div><div><h3>Methods</h3><div>We developed a Biobanking digital assessment maturity framework, BB4FAIR, comprising a survey divided into three macro areas, namely IT infrastructure, personnel, and data annotation richness. Furthermore, we implemented an automated R/Shiny system to analyse survey responses and generate visual data representations. We piloted the tool on 46 Italian biobanks that in 2023 had signed the partner charter with BBMRI. A scoring table facilitated the tiering of digital maturity, highlighting areas requiring corrective action.</div></div><div><h3>Results</h3><div>The assessment revealed significant heterogeneity across the three macro-areas of digitalization: almost half of the biobanks feature adequate IT infrastructure and personnel, and a smaller proportion have robust data annotation capabilities. Notably, most biobanks reported having a Biobank IT Management System (BIMS) or an alternative that serves their purposes, yet they still collect the consent to biobanking for future purposes in paper format; the digitalization of informed consent is generally lacking. These findings highlight the need for targeted improvements in Biobank digitalization to enhance overall data FAIRness.</div></div><div><h3>Conclusion</h3><div>The survey results underscore a pressing need for enhanced IT training and improved data annotation resources within the BBMRI.it. Corrective actions on many lacking features and desiderata are ongoing in the context of the #NextGenerationEu “Strengthening BBMRI.it” project.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104842"},"PeriodicalIF":4.0,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144012739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable deep neural networks for advancing early neonatal birth weight prediction using multimodal maternal factors 利用多模态母体因素推进新生儿早期出生体重预测的可解释深度神经网络
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-06 DOI: 10.1016/j.jbi.2025.104838
Muhammad Mursil , Hatem A. Rashwan , Adnan Khalid , Pere Cavallé-Busquets , Luis Santos-Calderon , Michelle M. Murphy , Domenec Puig
{"title":"Interpretable deep neural networks for advancing early neonatal birth weight prediction using multimodal maternal factors","authors":"Muhammad Mursil ,&nbsp;Hatem A. Rashwan ,&nbsp;Adnan Khalid ,&nbsp;Pere Cavallé-Busquets ,&nbsp;Luis Santos-Calderon ,&nbsp;Michelle M. Murphy ,&nbsp;Domenec Puig","doi":"10.1016/j.jbi.2025.104838","DOIUrl":"10.1016/j.jbi.2025.104838","url":null,"abstract":"<div><h3>Background:</h3><div>Neonatal low birth weight (LBW) is a significant predictor of increased morbidity and mortality among newborns. Predominantly, traditional prediction methods depend heavily on ultrasonography, which does not consider risk factors affecting birth weight (BW).</div></div><div><h3>Objective:</h3><div>This study introduces a robust deep neural network for a clinical decision-support system designed to early predict neonatal BW, using data available during early pregnancy, with enhanced precision. This innovative system incorporates a comprehensive array of maternal factors, placing particular emphasis on nutritional elements alongside physiological and lifestyle variables.</div></div><div><h3>Methods:</h3><div>We employed and validated various traditional machine learning models as well as an interpretable deep learning model using the TabNet architecture, noted for its proficient handling of tabular data and high level of interpretability. The efficacy of these models was evaluated against extensive datasets that encompass a broad spectrum of maternal health indicators.</div></div><div><h3>Results:</h3><div>The TabNet model exhibited outstanding predictive capabilities, achieving an accuracy of 96% and an area under the curve (AUC) of 0.96. Significantly, maternal vitamin B12 and folate status emerged as pivotal predictors of BW, emphasizing the crucial role of nutritional factors in influencing neonatal health outcomes.</div></div><div><h3>Conclusions:</h3><div>Our results demonstrate the substantial benefits of integrating multimodal maternal factors into predictive models for neonatal BW, markedly enhancing the precision over traditional AI methods. The developed decision-support system not only has a possible application in prenatal care but also provides actionable insights that can be leveraged to mitigate the risks associated with LBW, thereby improving clinical decision-making processes and outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104838"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143912295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal fusion architectures for Alzheimer’s disease diagnosis: An experimental study 用于阿尔茨海默病诊断的多模态融合架构:一项实验研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-06 DOI: 10.1016/j.jbi.2025.104834
Florence Leony , Chen-ju Lin , Alzheimer’s Disease Neuroimaging Initiative
{"title":"Multimodal fusion architectures for Alzheimer’s disease diagnosis: An experimental study","authors":"Florence Leony ,&nbsp;Chen-ju Lin ,&nbsp;Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.jbi.2025.104834","DOIUrl":"10.1016/j.jbi.2025.104834","url":null,"abstract":"<div><h3>Objective:</h3><div>In the attempt of early diagnosis of Alzheimer’s Disease, varying forms of medical records of multiple modalities are gathered to seize the interaction of multiple factors. However, the heterogeneity of multimodal data brings a challenge. Hence, the role of artificial intelligence comes into play to provide the medical practitioner assistance in making diagnosis and prognosis. In order to be adopted as a clinical decision support system, interpretable or explainable model is important for healthcare professionals to trust the results. This study assessed various popular machine learning models under two multimodal fusion architectures to find the best combination in terms of both predictive performance and interpretability.</div></div><div><h3>Methods:</h3><div>Two architectures, early and late, also known as feature- and decision-level fusion were chosen for multinomial classification task. On top of the commonly used simple concatenation, this study employed weighted and hybrid weighted concatenation to fuse features within and across modalities under the two fusion structures. To test the efficacy of each model pipeline, the assessment was done according to their distinct foundations on which the models were built and each of their advantages was recognized. Classification metrics were unified and visualized into a pentagon to compare the overall performance of each pipeline. In addition, interpretability analysis was provided to quantify the importance of each modality and feature recognized by each model.</div></div><div><h3>Results:</h3><div>The potential characteristics of each type of pipelines in terms of prediction accuracy and ability to capture the relevant markers of each cognitive state were uncovered. In this particular healthcare application, the tree-based and linear models were the top 2 choices. Coupled with early and late fusion structure with weighted concatenation, reaching the balanced accuracy of 0.920 and 0.912, consecutively. The top 5 most important features revealed belong to Cognitive Test Scores and Neuropsychological Battery of Test modalities.</div></div><div><h3>Conclusion:</h3><div>This work contributes as medical applications of artificial intelligence evaluation to aid practitioners in understanding the capability of different fusion architectures with different classifiers in getting to know the use of machine learning in clinical setting. With accurate classification, early detection of Mild Cognitive Impairment and Alzheimer’s Disease can be achieved.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104834"},"PeriodicalIF":4.0,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143917764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A transformer-based framework for temporal health event prediction with graph-enhanced representations 一个基于转换器的框架,用于具有图形增强表示的时间健康事件预测
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-03 DOI: 10.1016/j.jbi.2025.104826
Tianci Liu , Lizhong Liang , Chao Che , Yunjiong Liu , Bo Jin
{"title":"A transformer-based framework for temporal health event prediction with graph-enhanced representations","authors":"Tianci Liu ,&nbsp;Lizhong Liang ,&nbsp;Chao Che ,&nbsp;Yunjiong Liu ,&nbsp;Bo Jin","doi":"10.1016/j.jbi.2025.104826","DOIUrl":"10.1016/j.jbi.2025.104826","url":null,"abstract":"<div><h3>Objective:</h3><div>Deep learning approaches have demonstrated significant potential in predicting temporal health events in recent years. However, existing methods have not fully leveraged the complex interactions among comorbidities and have overlooked imbalances and temporal irregularities in admission records.</div></div><div><h3>Methods:</h3><div>This study proposes GLT-Net, a deep learning approach that combines <u>G</u>raph <u>L</u>earning with <u>T</u>ransformer framework to tackle these challenges. GLT-Net begins by constructing a patient association graph to generate unique representations for each individual. At the same time, the hierarchical structure of diagnosis codes is utilized to pre-train the diagnosis code embeddings. Subsequently, a comorbidity association matrix is created to illustrate the relationships between comorbidities, and graph neural networks are employed to enhance the feature representations of diagnosis codes. Finally, a Transformer-Encoder framework captures the dependencies in historical admission records by incorporating time information.</div></div><div><h3>Results:</h3><div>We demonstrate our approach on two tasks in temporal health event predcition. Experimental results on real-world datasets show that GLT-Net outperforms baseline models in forecasting temporal health events. Additionally, a case study demonstrates the effectiveness of GLT-Net in predicting health events.</div></div><div><h3>Conclusion:</h3><div>Understanding progression patterns over time, comorbidity associations, and patient characterization is essential for predicting temporal health events. Our study provides new insights and methods for a deeper understanding of patient health status and disease trends. Moreover, our model can be extended to other data sources, enhancing its versatility.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104826"},"PeriodicalIF":4.0,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143928615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bioinformatic challenges in metagenomic next generation sequencing data analysis while unravelling a case of uncommon campylobacteriosis 新一代宏基因组测序数据分析中的生物信息学挑战,同时揭示了一例罕见的弯曲菌病
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-02 DOI: 10.1016/j.jbi.2025.104841
Rok Kogoj , Martin Bosilj , Andraž Celar Šturm , Misa Korva , Katja Strašek Smrdel , Eva Kvas , Mateja Pirš , Lidija Lepen , Tina Triglav
{"title":"Bioinformatic challenges in metagenomic next generation sequencing data analysis while unravelling a case of uncommon campylobacteriosis","authors":"Rok Kogoj ,&nbsp;Martin Bosilj ,&nbsp;Andraž Celar Šturm ,&nbsp;Misa Korva ,&nbsp;Katja Strašek Smrdel ,&nbsp;Eva Kvas ,&nbsp;Mateja Pirš ,&nbsp;Lidija Lepen ,&nbsp;Tina Triglav","doi":"10.1016/j.jbi.2025.104841","DOIUrl":"10.1016/j.jbi.2025.104841","url":null,"abstract":"<div><h3>Objective</h3><div>This study aimed to employ advanced bioinformatics and modern sequencing approaches to solve a diagnostic problem of persistent <em>Campylobacter</em> spp. molecular detection yet negative culture results from four consecutive stool samples of a previously healthy patient with newly diagnosed selective IgA deficiency and prolonged diarrhoea.</div></div><div><h3>Methods</h3><div>Metagenomic next-generation sequencing (mNGS) based on short-paired end reads with basic bioinformatic read classification analysis was used at first. Due to ambiguous results, advanced bioinformatics involving contigs construction and classification, reference genome mappings and reads filtering with BBSplit, additionally coupled with metagenomic long-reads sequencing and Full-length 16S rRNA metabarcoding were employed to further elucidate the results. Virulence factors were analysed using the Prokka Genome Annotation tool. Modified classical bacteriology methods were finally used for further clarification.</div></div><div><h3>Results</h3><div>Short-pair end reads analysis identified several <em>Campylobacter</em> species in all four samples. After advanced bioinformatic approaches were applied, candidatus <em>C. infans</em> was suspected as the putative pathogen. This result was further supported by metagenomic long-reads sequencing and Full-length 16S rRNA metabarcoding. Nevertheless, after modifying the culture conditions based on mNGS results, a mixed culture of candidatus <em>C. infans</em> and <em>C.<!--> <!-->ureolyticus</em> was obtained. Sequencing of the mixed culture resulted in an 87.48% and 73.47% genome coverage of candidatus <em>C. infans</em> and <em>C. ureolyticus</em>, respectively. In the candidatus <em>C. infans</em> genome more virulence factors hits were found than in the <em>C. ureolyticus</em> genome thus supporting the first as the most probable cause of symptoms.</div></div><div><h3>Conclusion</h3><div>This study shows the pivotal role and strengths of mNGS in unravelling an unusual case of diarrhoea and demonstrates how mNGS can guide established microbiological methods to improve on current limitations. However, it also emphasises the need for careful interpretation of sequencing data, particularly for closely related bacterial species from clinical samples that are known to support complex microbial communities.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104841"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records CD-Tron:利用大型临床语言模型从电子健康记录中早期检测认知能力下降
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-05-02 DOI: 10.1016/j.jbi.2025.104830
Hao Guan , John Novoa-Laurentiev , Li Zhou
{"title":"CD-Tron: Leveraging large clinical language model for early detection of cognitive decline from electronic health records","authors":"Hao Guan ,&nbsp;John Novoa-Laurentiev ,&nbsp;Li Zhou","doi":"10.1016/j.jbi.2025.104830","DOIUrl":"10.1016/j.jbi.2025.104830","url":null,"abstract":"<div><h3>Background:</h3><div>Early detection of cognitive decline during the preclinical stage of Alzheimer’s disease and related dementias (AD/ADRD) is crucial for timely intervention and treatment. Clinical notes in the electronic health record contain valuable information that can aid in the early identification of cognitive decline. In this study, we utilize advanced large clinical language models, fine-tuned on clinical notes, to improve the early detection of cognitive decline.</div></div><div><h3>Methods:</h3><div>We collected clinical notes from 2,166 patients spanning the 4 years preceding their initial mild cognitive impairment (MCI) diagnosis from the Enterprise Data Warehouse of Mass General Brigham. To train the model, we developed CD-Tron, built upon a large clinical language model that was finetuned using 4,949 expert-labeled note sections. For evaluation, the trained model was applied to 1,996 independent note sections to assess its performance on real-world unstructured clinical data. Additionally, we used explainable AI techniques, specifically SHAP values (SHapley Additive exPlanations), to interpret the model’s predictions and provide insight into the most influential features. Error analysis was also facilitated to further analyze the model’s prediction.</div></div><div><h3>Results:</h3><div>CD-Tron significantly outperforms baseline models, achieving notable improvements in precision, recall, and AUC metrics for detecting cognitive decline (CD). Tested on many real-world clinical notes, CD-Tron demonstrated high sensitivity with only one false negative, crucial for clinical applications prioritizing early and accurate CD detection. SHAP-based interpretability analysis highlighted key textual features contributing to model predictions, supporting transparency and clinician understanding.</div></div><div><h3>Conclusion:</h3><div>CD-Tron offers a novel approach to early cognitive decline detection by applying large clinical language models to free-text EHR data. Pretrained on real-world clinical notes, it accurately identifies early cognitive decline and integrates SHAP for interpretability, enhancing transparency in predictions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104830"},"PeriodicalIF":4.0,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143924195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies 利用图表回顾表型中的未确定病例来加强基于ehr的关联研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-30 DOI: 10.1016/j.jbi.2025.104839
Xinyao Jian , Dazheng Zhang , Zehao Yu , Hua Xu , Jiang Bian , Yonghui Wu , Jiayi Tong , Yong Chen
{"title":"Leveraging undecided cases in chart-reviewed phenotypes to enhance EHR-based association studies","authors":"Xinyao Jian ,&nbsp;Dazheng Zhang ,&nbsp;Zehao Yu ,&nbsp;Hua Xu ,&nbsp;Jiang Bian ,&nbsp;Yonghui Wu ,&nbsp;Jiayi Tong ,&nbsp;Yong Chen","doi":"10.1016/j.jbi.2025.104839","DOIUrl":"10.1016/j.jbi.2025.104839","url":null,"abstract":"<div><h3>Objectives</h3><div>In electronic health record (EHR)-based association studies, phenotyping algorithms efficiently classify patient clinical outcomes into binary categories but are susceptible to misclassification errors. The gold standard, manual chart review, involves clinicians determining the true disease status based on their assessment of health records. These clinicians-labeled phenotypes are labor-intensive and typically limited to a small subset of patients, potentially introducing a third “undecided” category when phenotypes are indeterminate. We aim to effectively integrate the algorithm-derived and chart-reviewed outcomes when both are available in EHR-based association studies.</div></div><div><h3>Material and Methods</h3><div>We propose an augmented estimation method that combines the binary algorithm-derived phenotypes for the entire cohort with the trinary chart-reviewed phenotypes for a small, selected subset. Additionally, a cost-effective outcome-dependent sampling strategy is used to address the rare disease scenarios. The proposed trinary chart-reviewed phenotype integrated cost-effective augmented estimation (TriCA) was evaluated across a wide range of simulation settings and real-world applications, including using EHR data on Alzheimer’s disease and related dementias (ADRD) from the OneFlorida + Clinical Research Network, and using cohort data on second breast cancer events (SBCE) from the Kaiser Permanente Washington.</div></div><div><h3>Results</h3><div>Compared to estimation based on random sampling, our augmented method improved mean square error by up to 28.3% in simulation studies; compared to estimation using only trinary chart-reviewed phenotypes, our method improved efficiency by up to 33.3% in ADRD data and 50.8% in SBCE data.</div></div><div><h3>Discussion</h3><div>Our simulation studies and real-world applications demonstrate that, compared to existing methods, the proposed method provides unbiased estimates with higher statistical efficiency.</div></div><div><h3>Conclusion</h3><div>The proposed method effectively combined binary algorithm-derived phenotypes for the whole cohort with trinary chart-reviewed outcomes for a limited validation set, making it applicable to a broader range of applications and enhancing risk factor identification in EHR-based association studies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104839"},"PeriodicalIF":4.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信