Journal of Biomedical Informatics最新文献

筛选
英文 中文
Missing-modality enabled multi-modal fusion architecture for medical data
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-21 DOI: 10.1016/j.jbi.2025.104796
Muyu Wang , Shiyu Fan , Yichen Li , Zhongrang Xie , Hui Chen
{"title":"Missing-modality enabled multi-modal fusion architecture for medical data","authors":"Muyu Wang ,&nbsp;Shiyu Fan ,&nbsp;Yichen Li ,&nbsp;Zhongrang Xie ,&nbsp;Hui Chen","doi":"10.1016/j.jbi.2025.104796","DOIUrl":"10.1016/j.jbi.2025.104796","url":null,"abstract":"<div><h3>Background</h3><div>Fusion of multi-modal data can improve the performance of deep learning models. However, missing modalities are common in medical data due to patient specificity, which is detrimental to the performance of multi-modal models in applications. Therefore, it is critical to adapt the models to missing modalities.</div></div><div><h3>Objective</h3><div>This study aimed to develop an effective multi-modal fusion architecture for medical data that was robust to missing modalities and further improved the performance for clinical tasks.</div></div><div><h3>Methods</h3><div>X-ray chest radiographs for the image modality, radiology reports for the text modality, and structured value data for the tabular data modality were fused in this study. Each modality pair was fused with a Transformer-based bi-modal fusion module, and the three bi-modal fusion modules were then combined into a tri-modal fusion framework. Additionally, multivariate loss functions were introduced into the training process to improve models’ robustness to missing modalities during the inference process. Finally, we designed comparison and ablation experiments to validate the effectiveness of the fusion, the robustness to missing modalities, and the enhancements from each key component. Experiments were conducted on MIMIC-IV and MIMIC-CXR datasets with the 14-label disease diagnosis and patient in-hospital mortality prediction task The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to evaluate models’ performance.</div></div><div><h3>Results</h3><div>Our proposed architecture showed superior predictive performance, achieving the average AUROC and AUPRC of 0.916 and 0.551 in the 14-label classification task, 0.816 and 0.392 in the mortality prediction task. while the best average AUROC and AUPRC among the comparison methods were 0.876, 0.492 in the 14-label classification task and 0.806, 0.366 in the mortality prediction task. Both metrics decreased only slightly when tested with modal-incomplete data. Different levels of enhancements were achieved through three key components.</div></div><div><h3>Conclusions</h3><div>The proposed multi-modal fusion architecture effectively fused three modalities and showed strong robustness to missing modalities. This architecture holds promise for scaling up to more modalities to enhance the clinical practicality of the model.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104796"},"PeriodicalIF":4.0,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143479520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Imputation of missing aggregate EHR audit log data across individual and multiple organizations
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-18 DOI: 10.1016/j.jbi.2025.104805
Huan Li , Nate C. Apathy , A Jay Holmgren , Edward R. Melnick , Robert A. McDougal
{"title":"Imputation of missing aggregate EHR audit log data across individual and multiple organizations","authors":"Huan Li ,&nbsp;Nate C. Apathy ,&nbsp;A Jay Holmgren ,&nbsp;Edward R. Melnick ,&nbsp;Robert A. McDougal","doi":"10.1016/j.jbi.2025.104805","DOIUrl":"10.1016/j.jbi.2025.104805","url":null,"abstract":"<div><h3>Objective</h3><div>To compare naive versus machine learning imputation strategies’ efficacy for imputing missing data in EHR-vendor generated data, explore subgrouping criteria, and evaluate performance and feasibility for in-house implementation.</div></div><div><h3>Materials and Methods</h3><div>Missing data imputation experiments involving various types and sizes of organizations were conducted using physician-only aggregate EHR audit log data. Organizations were categorized by teaching status. Based on the coefficient of variation and missing percentage, variables were classified into three categories before imputation. The model with the highest R<sup>2</sup>-value was selected as the most robust option.</div></div><div><h3>Results</h3><div>Teaching and non-teaching organizations showed similar R<sup>2</sup> trends in model selection, though some differences existed within each class. Moreover, the rolling average provided more consistent R<sup>2</sup> results across various organization sizes, especially for medium and small-sized organizations. XGBoost performed slightly better in large organizations than in small organizations. Comparisons between single- and multi-site organizations revealed higher R<sup>2</sup>-values for single organizations using their own data for imputation as opposed to merging.</div></div><div><h3>Discussion/Conclusion:</h3><div>The study introduced a systematic method for classifying variables and determining the best imputation strategy for each class. It also tested the scalability of this approach for individual organizations. Organizations can effectively use this method, including variable classification and tailored imputation methods. Organization size did not significantly affect the imputation process. The rolling average time-series method outperformed the machine learning method, which used non-time-series approaches. Combining data from diverse sites does not necessarily improve machine learning imputation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104805"},"PeriodicalIF":4.0,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multimodal machine learning algorithm improved diagnostic accuracy for otitis media in a school aged Aboriginal population
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-17 DOI: 10.1016/j.jbi.2025.104801
Jacqueline H. Stephens , Phong Phu Nguyen , Amanda Machell , Linnett Sanchez , Eng H. Ooi , A. Simon Carney , Trent Lewis
{"title":"A multimodal machine learning algorithm improved diagnostic accuracy for otitis media in a school aged Aboriginal population","authors":"Jacqueline H. Stephens ,&nbsp;Phong Phu Nguyen ,&nbsp;Amanda Machell ,&nbsp;Linnett Sanchez ,&nbsp;Eng H. Ooi ,&nbsp;A. Simon Carney ,&nbsp;Trent Lewis","doi":"10.1016/j.jbi.2025.104801","DOIUrl":"10.1016/j.jbi.2025.104801","url":null,"abstract":"<div><h3>Objective</h3><div>Otitis Media (OM) – ear infection – can lead to hearing loss and associated developmental delay. There are several subgroups of OM which can be difficult to diagnose accurately, even for experienced clinicians. AI and machine learning algorithms for OM diagnosis are evolving but typically only focus on one defined diagnostic feature of OM. This study aimed to establish if combining otoscopic and tympanometry data improves the diagnostic accuracy of a ML algorithm for diagnosing OM and its various subgroups.</div></div><div><h3>Methods</h3><div>We used an existing dataset containing data from 813 school-aged children (aged five to eight years) from 10 Aboriginal communities in remote South Australia. Data were collected between 2009 and 2011. All children underwent video otoscopy and tympanometry assessment of both ears and diagnosis of OM was made by otorhinolaryngology (ENT) surgeons. After data augmentation and preprocessing, the database contained 15,057 samples with matched video otoscopy and tympanometry data (normal: n = 8,239; abnormal: n = 6,746). Support Vector Machine models were used to build the ML system.</div></div><div><h3>Results</h3><div>By combining tympanometry data with the probability prediction of the single otoscopy model, the accuracy of the system increased from 78 % (otoscopy data) to 82 % (otoscopy and tympanometry data).</div></div><div><h3>Conclusion</h3><div>Compared to diagnosis based solely on otoscopy data, combining otoscopy and tympanometry data increased the diagnostic accuracy of the ML algorithm. This approach could be used to support the accurate diagnosis of OM in children and can facilitate timely and appropriate treatment, especially in rural and remote areas.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"164 ","pages":"Article 104801"},"PeriodicalIF":4.0,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143458075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing libraries of semantically-augmented graphics as visual standards for biomedical information systems
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-16 DOI: 10.1016/j.jbi.2025.104804
Melissa D. Clarkson, Steven Roggenkamp, Landon T. Detwiler
{"title":"Developing libraries of semantically-augmented graphics as visual standards for biomedical information systems","authors":"Melissa D. Clarkson,&nbsp;Steven Roggenkamp,&nbsp;Landon T. Detwiler","doi":"10.1016/j.jbi.2025.104804","DOIUrl":"10.1016/j.jbi.2025.104804","url":null,"abstract":"<div><h3>Objective</h3><div>Visual representations generally serve as supplements to information, rather than as bearers of computable information themselves. Our objective is to develop a method for creating semantically-augmented graphic libraries that will serve as visual standards and can be implemented as visual assets in intelligent information systems.</div></div><div><h3>Methods</h3><div>Graphics were developed using a composable approach and specified using SVG. OWL was used to represent the entities of our system, which include elements, units, graphics, graphic libraries, and library collections. A graph database serves as our data management system. Semantics are applied at multiple levels: (a) each element is associated with a semantic style class to link visual style to semantic meaning, (b) graphics are described using object properties and data properties, (c) relationships are specified between graphics, and (d) mappings are made between the graphics and outside resources.</div></div><div><h3>Results</h3><div>The Graphic Library web application enables users to browse the libraries, view information pages for each graphic, and download individual graphics. We demonstrate how SPARQL can be employed to query the graphics database and the APIs can be used to retrieve the graphics and associated data for applications. In addition, this work shows that our method of designing composable graphics is well-suited to depicting variations in human anatomy.</div></div><div><h3>Conclusion</h3><div>This work provides a bridge between visual communication and the field of knowledge representation. We demonstrate a method for creating visual standards that are compatible with practices in biomedical ontology and implement a system for making them accessible to information systems.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104804"},"PeriodicalIF":4.0,"publicationDate":"2025-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptable graph neural networks design to support generalizability for clinical event prediction
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-15 DOI: 10.1016/j.jbi.2025.104794
Amara Tariq , Gurkiran Kaur , Leon Su , Judy Gichoya , Bhavik Patel , Imon Banerjee
{"title":"Adaptable graph neural networks design to support generalizability for clinical event prediction","authors":"Amara Tariq ,&nbsp;Gurkiran Kaur ,&nbsp;Leon Su ,&nbsp;Judy Gichoya ,&nbsp;Bhavik Patel ,&nbsp;Imon Banerjee","doi":"10.1016/j.jbi.2025.104794","DOIUrl":"10.1016/j.jbi.2025.104794","url":null,"abstract":"<div><h3>Objective</h3><div>While many machine learning and deep learning-based models for clinical event prediction leverage various data elements from electronic healthcare records such as patient demographics and billing codes, such models face severe challenges when tested outside of their institution of training. These challenges are rooted not only in differences in patient population characteristics, but medical practice patterns of different institutions.</div></div><div><h3>Method</h3><div>We propose a solution to this problem through systematically adaptable design of graph-based convolutional neural networks (GCNN) for clinical event prediction. Our solution relies on the unique property of GCNN where data encoded as graph edges is only implicitly used during the prediction process and can be adapted after model training without requiring model re-training.</div></div><div><h3>Results</h3><div>Our adaptable GCNN-based prediction models outperformed all comparative models during external validation for two different clinical problems, while supporting multimodal data integration. For prediction of hospital discharge and mortality, the comparative fusion baseline model achieved 0.58 [0.52–0.59] and 0.81[0.80–0.82] AUROC on the external dataset while the GCNN achieved 0.70 [0.68–0.70] and 0.91 [0.90–0.92] respectively. For prediction of future unplanned transfusion, we observed even more gaps in performance due to missing/incomplete data in the external dataset − late fusion achieved 0.44[0.31–0.56] while the GCNN model achieved 0.70 [0.62–0.84].</div></div><div><h3>Conclusion</h3><div>These results support our hypothesis that carefully designed GCNN-based models can overcome generalization challenges faced by prediction models.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104794"},"PeriodicalIF":4.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143433235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-15 DOI: 10.1016/j.jbi.2025.104800
Georgios Petmezas , Vasileios E. Papageorgiou , Vassilios Vassilikos , Efstathios Pagourelias , Dimitrios Tachmatzidis , George Tsaklidis , Aggelos K. Katsaggelos , Nicos Maglaveras
{"title":"Enhanced heart failure mortality prediction through model-independent hybrid feature selection and explainable machine learning","authors":"Georgios Petmezas ,&nbsp;Vasileios E. Papageorgiou ,&nbsp;Vassilios Vassilikos ,&nbsp;Efstathios Pagourelias ,&nbsp;Dimitrios Tachmatzidis ,&nbsp;George Tsaklidis ,&nbsp;Aggelos K. Katsaggelos ,&nbsp;Nicos Maglaveras","doi":"10.1016/j.jbi.2025.104800","DOIUrl":"10.1016/j.jbi.2025.104800","url":null,"abstract":"<div><div>Heart failure (HF) remains a significant public health challenge with high mortality rates. Machine learning (ML) techniques offer a promising approach to predict HF mortality, potentially improving clinical outcomes. However, the effectiveness of these techniques heavily depends on the quality and relevance of the features used. This study introduces a novel hybrid feature selection methodology that combines Extremely Randomized Trees (Extra-Trees) and non-linear correlation measures to enhance 1-year all-cause mortality prediction in HF patients using echocardiographic and key demographic data. Unlike existing feature selection methods that are often tied to specific ML models and produce inconsistent feature sets across different algorithms, our proposed approach is model-independent, ensuring robustness and generalizability. Moreover, the optimal number of predictive features is identified through loss graph inspection, leading to a compact and highly informative subset of seven features. We trained and evaluated seven widely-used ML models on both the full feature set and the selected subset, finding that most models maintained or improved their predictive performance despite an 80% reduction in features. Model interpretability was enhanced using SHapley Additive exPlanations (SHAP), allowing for a detailed examination of how individual features influence predictions. To further assess its effectiveness, we compared our methodology against widely known feature selection techniques across all seven ML models. The results underscore the superiority of our proposed feature set in accurately predicting HF mortality over conventional methods, offering new opportunities for personalized management strategies based on a streamlined and explainable feature subset.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104800"},"PeriodicalIF":4.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-12 DOI: 10.1016/j.jbi.2025.104788
Maxime Wack , Adrien Coulet , Anita Burgun , Bastien Rance
{"title":"Enhancing clinical data warehousing with provenance data to support longitudinal analyses and large file management: The gitOmmix approach for genomic and image data","authors":"Maxime Wack ,&nbsp;Adrien Coulet ,&nbsp;Anita Burgun ,&nbsp;Bastien Rance","doi":"10.1016/j.jbi.2025.104788","DOIUrl":"10.1016/j.jbi.2025.104788","url":null,"abstract":"<div><h3>Background:</h3><div>If hospital Clinical Data Warehouses are to address today’s focus in personalized medicine, they need to be able to track patients longitudinally and manage the large data sets generated by whole genome sequencing, RNA analyses, and complex imaging studies. Current Clinical Data Warehouses address neither issue. This paper reports on methods to enrich current systems by providing provenance data allowing patient histories to be followed longitudinally and managing the linking and versioning of large data sets from whatever source. The methods are open source and applicable to any clinical data warehouse system, whether data schema it uses.</div></div><div><h3>Method:</h3><div>We introduce <span>gitOmmix</span>, an approach that overcomes these limitations, and illustrate its usefulness in the management of medical omics data. <span>gitOmmix</span> relies on <em>(i)</em> a file versioning system: git, <em>(ii)</em> an extension that handles large files: git-annex, <em>(iii)</em> a provenance knowledge graph: PROV-O, and <em>(iv)</em> an alignment between the git versioning information and the provenance knowledge graph.</div></div><div><h3>Results:</h3><div>Capabilities inherited from git and git-annex enable retracing the history of a clinical interpretation back to the patient sample, through supporting data and analyses. In addition, the provenance knowledge graph, aligned with the git versioning information, enables querying and browsing provenance relationships between these elements.</div></div><div><h3>Conclusion:</h3><div><span>gitOmmix</span> adds a provenance layer to CDWs, while scaling to large files and being agnostic of the CDW system. For these reasons, we think that it is a viable and generalizable solution for omics clinical studies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104788"},"PeriodicalIF":4.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143425589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Precision Drug Repurposing (PDR): Patient-level modeling and prediction combining foundational knowledge graph with biobank data
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-12 DOI: 10.1016/j.jbi.2025.104786
Çerağ Oğuztüzün , Zhenxiang Gao , Hui Li , Rong Xu
{"title":"Precision Drug Repurposing (PDR): Patient-level modeling and prediction combining foundational knowledge graph with biobank data","authors":"Çerağ Oğuztüzün ,&nbsp;Zhenxiang Gao ,&nbsp;Hui Li ,&nbsp;Rong Xu","doi":"10.1016/j.jbi.2025.104786","DOIUrl":"10.1016/j.jbi.2025.104786","url":null,"abstract":"<div><h3>Objective:</h3><div>Drug repurposing accelerates therapeutic development by finding new indications for approved drugs. However, accounting for individual patient differences is challenging. This study introduces a Precision Drug Repurposing (PDR) framework at single-patient resolution, integrating individual-level data with a foundational biomedical knowledge graph to enable personalized drug discovery.</div></div><div><h3>Methods:</h3><div>We developed a framework integrating patient-specific data from the UK Biobank (Polygenic Risk Scores, biomarker expressions, and medical history) with a comprehensive biomedical knowledge graph (61,146 entities, 1,246,726 relations). Using Alzheimer’s Disease as a case study, we compared three diverse patient-specific models with a foundational model through standard link prediction metrics. We evaluated top predicted candidate drugs using patient medication history and literature review.</div></div><div><h3>Results:</h3><div>Our framework maintained the robust prediction capabilities of the foundational model. The integration of patient data, particularly Polygenic Risk Scores (PRS), significantly influenced drug prioritization (Cohen’s d = 1.05 for scoring differences). Ablation studies demonstrated PRS’s crucial role, with effect size decreasing to 0.77 upon removal. Each patient model identified novel drug candidates that were missed by the foundational model but showed therapeutic relevance when evaluated using patient’s own medication history. These candidates were further supported by aligned literature evidence with the patient-level genetic risk profiles based on PRS.</div></div><div><h3>Conclusion:</h3><div>This exploratory study demonstrates a promising approach to precision drug repurposing by integrating patient-specific data with a foundational knowledge graph.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104786"},"PeriodicalIF":4.0,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143403492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-07 DOI: 10.1016/j.jbi.2025.104789
Yiming Li , Deepthi Viswaroopan , William He , Jianfu Li , Xu Zuo , Hua Xu , Cui Tao
{"title":"Improving entity recognition using ensembles of deep learning and fine-tuned large language models: A case study on adverse event extraction from VAERS and social media","authors":"Yiming Li ,&nbsp;Deepthi Viswaroopan ,&nbsp;William He ,&nbsp;Jianfu Li ,&nbsp;Xu Zuo ,&nbsp;Hua Xu ,&nbsp;Cui Tao","doi":"10.1016/j.jbi.2025.104789","DOIUrl":"10.1016/j.jbi.2025.104789","url":null,"abstract":"<div><h3>Objective</h3><div>Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations, identifying potential risks and ensuring the safe use of these products. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition (NER) tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance.</div></div><div><h3>Methods</h3><div>In this study, we utilized reports and posts from the Vaccine Adverse Event Reporting System (VAERS) (n = 230), Twitter (n = 3,383), and Reddit (n = 49) as our corpora. Our goal was to extract three types of entities: vaccine, shot, and adverse event (ae). We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, Llama-2 7b, and Llama-2 13b, as well as traditional deep learning models like Recurrent neural network (RNN) and Bidirectional Encoder Representations from Transformers for Biomedical Text Mining (BioBERT). To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance.</div></div><div><h3>Results</h3><div>The ensemble demonstrated the best performance in identifying the entities “vaccine,” “shot,” and “ae,” achieving strict F1-scores of 0.878, 0.930, and 0.925, respectively, and a micro-average score of 0.903. These results underscore the significance of fine-tuning models for specific tasks and demonstrate the effectiveness of ensemble methods in enhancing performance.</div></div><div><h3>Conclusion</h3><div>In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information following COVID-19 vaccination. This study contributes to the advancement of natural language processing in the biomedical domain, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104789"},"PeriodicalIF":4.0,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143382606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Patient deep spatio-temporal encoding and medication substructure mapping for safe medication recommendation
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-02-06 DOI: 10.1016/j.jbi.2025.104785
Haoqin Yang , Yuandong Liu , Longbo Zhang , Hongzhen Cai , Kai Che , Linlin Xing
{"title":"Patient deep spatio-temporal encoding and medication substructure mapping for safe medication recommendation","authors":"Haoqin Yang ,&nbsp;Yuandong Liu ,&nbsp;Longbo Zhang ,&nbsp;Hongzhen Cai ,&nbsp;Kai Che ,&nbsp;Linlin Xing","doi":"10.1016/j.jbi.2025.104785","DOIUrl":"10.1016/j.jbi.2025.104785","url":null,"abstract":"<div><div>Medication recommendations are designed to provide physicians and patients with personalized, accurate and safe medication choices to maximize patient outcomes. Although significant progress has been made in related research, three major challenges remain: inadequate modeling of patients’ multidimensional and time-series information, insufficient representation of medication substructures, and poor balance between model accuracy and drug-drug interactions. To address these issues , a safe medication recommendation model SDRBT based on patient deep spatio-temporal encoding and medication substructure mapping is proposed in this paper. SDRBT has developed a patient deep temporal and spatial coding module, which combines symptom information, disease diagnosis information, and treatment information from the patient’s electronic health record data. It innovatively utilizes the Block Recurrent Transformer to model longitudinal temporal information of patients in different dimensions to obtain the horizontal representation of the patient’s current visit. A dual-domain mapping module for medication substructures is designed to perform global and local mapping of medications, fully learning and aggregating medication substructure representations. Finally, a PID LOSS control unit was designed, in which we studied a drug interaction control module based on the similarity calculation between the electronic health map and the drug interaction graph. This module ensures the safety of the recommended medication combination effectively improved the recommendation efficiency and reduced the model training time. Experiments on the public MIMIC-III dataset demonstrate SDRBT’s superior accuracy in medication recommendation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"163 ","pages":"Article 104785"},"PeriodicalIF":4.0,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143349791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信