Shuyue Jia, Subhrangshu Bit, Edward Searls, Lindsey Claus, Pengrui Fan, Varuna H. Jasodanand, Meagan V. Lauber, Divya Veerapaneni, William M. Wang, Rhoda Au, Vijaya B Kolachalama
{"title":"MedPodGPT: A multilingual audio-augmented large language model for medical research and education","authors":"Shuyue Jia, Subhrangshu Bit, Edward Searls, Lindsey Claus, Pengrui Fan, Varuna H. Jasodanand, Meagan V. Lauber, Divya Veerapaneni, William M. Wang, Rhoda Au, Vijaya B Kolachalama","doi":"10.1101/2024.07.11.24310304","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310304","url":null,"abstract":"The proliferation of medical podcasts has generated an extensive repository of audio content, rich in specialized terminology, diverse medical topics, and expert dialogues. Here we introduce a computational framework designed to enhance large language models (LLMs) by leveraging the informational content of publicly accessible medical podcast data. This dataset, comprising over 4,300 hours of audio content, was transcribed to generate over 39 million text tokens. Our model, MedPodGPT, integrates the varied dialogue found in medical podcasts to improve understanding of natural language nuances, cultural contexts, and medical knowledge. Evaluated across multiple benchmarks, MedPodGPT demonstrated an average improvement of 2.31% over standard open-source benchmarks and showcased an improvement of 2.58% in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, MedPodGPT advances natural language processing, offering enhanced capabilities for various applications in medical research and education.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2D Transfer Learning for ECG Classification using Continuous Wavelet Transform","authors":"Wei Zhang","doi":"10.1101/2024.07.11.24310258","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310258","url":null,"abstract":"Advanced deep neural networks, when trained on extensive datasets, can outperform cardiologists in diagnosing cardiac arrhythmias. However, the availability of large-scale training data is often impractical. This study explores the use of transfer learning to identify and classify three ECG patterns. It applies knowledge gained from 2D image classification tasks to the domain of 1D time-series ECG signal classification. The research leverages various deep learning models to classify continuous wavelet transform (2D representations) of ECG signals. The effectiveness of these transferred deep learning models in classifying ECG time-series data is then evaluated.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mukund Gupta, Edbert Victor Fandy, Krrish Ghindani
{"title":"EARLY LUNG CANCER SCREENING: A COMPARATIVE STUDY OF CNN AND RADIOMICS MODELS WITH PULMONARY NODULE BIOLOGIC CHARACTERIZATION","authors":"Mukund Gupta, Edbert Victor Fandy, Krrish Ghindani","doi":"10.1101/2024.07.06.24309995","DOIUrl":"https://doi.org/10.1101/2024.07.06.24309995","url":null,"abstract":"Lung cancer has become an increasingly prevalent disease, with an estimated 125,070 deaths in the\u0000United States alone in 2024 ( 5). To improve patient outcomes and assist doctors in differentiating between benign and malignant pulmonary nodules, this paper developed a Convolutional Neural Network (CNN) model for early binary detection of pulmonary nodules and assessed its effectiveness compared to other approaches. The CNN model showed an accuracy of 98.47%, while the radiomics-based SVM-LASSO model and the Lung-RADS system showed accuracies of 84.6% and 72.2%\u0000respectively. This demonstrates that the CNN model is significantly more effective for the early\u0000binary detection of pulmonary nodules than both the radiomics-based model and the Lung-RADS\u0000system. The paper also discusses the applications of Deep Learning in healthcare, concluding that\u0000although AI proves to be an effective method for early lung cancer detection, more research is needed to carefully assess the role and impact of AI in healthcare.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characteristics of Suicide Prevention Apps: A Content Analysis of Apps Available in Canada and the United Kingdom","authors":"Laura Bennett-Poynter, Samantha Groves, Jessica Kemp, Hwayeon Danielle Shin, Lydia Sequeira, Karen Lascelles, Gillian Strudwick","doi":"10.1101/2024.07.10.24310091","DOIUrl":"https://doi.org/10.1101/2024.07.10.24310091","url":null,"abstract":"Objective: We aimed to examine the characteristics, features, and content of suicide prevention mobile apps available in app stores in Canada and the United Kingdom.\u0000Design: Suicide prevention apps were identified from Apple and Android app stores between March-April 2023. Apps were screened against predefined inclusion criteria, and duplicate apps were removed. Data were then extracted based on descriptive (e.g., genre, app developer), security (e.g., password protection), and design features (e.g., personalization options). Content of apps were assessed using the Essential Features Framework. Extracted data were analyzed using a content analysis approach including narrative frequencies and descriptive statistics.\u0000Results: Fifty-two (n=52) suicide prevention apps were included within the review. Most were tailored for the general population and were in English language only. One app had the option to increase app accessibility by offering content presented using sign language. Many apps allowed some form of personalization by adding text content, however most did not facilitate further customization such as the ability to upload photo and audio content. All identified apps included content from at least one of the domains of the Essential Features Framework. The most commonly included domains were sources of suicide prevention support, and information about suicide. The domain least frequently included was screening tools followed by wellness content. No identified apps had the ability to be linked to patient medical records.\u0000Conclusions: The findings of this research present implications for the development of future suicide prevention apps. Development of a co-produced suicide prevention app which is accessible, allows for personalization, and can be integrated into clinical care may present an opportunity to enhance suicide prevention support for individuals experiencing suicidal thoughts and behaviours.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"434 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peter Bowman Mack, Casey Cole, Mintaek Lee, Lisa Peterson, Matthew Lundy, Karen Elizabeth Hegarty, William Espinoza
{"title":"The Impact of a Primary Aldosteronism Predictive Model in Secondary Hypertension Decision Support","authors":"Peter Bowman Mack, Casey Cole, Mintaek Lee, Lisa Peterson, Matthew Lundy, Karen Elizabeth Hegarty, William Espinoza","doi":"10.1101/2024.07.09.24310088","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310088","url":null,"abstract":"Objective: To determine whether the addition of a primary aldosteronism (PA) predictive model to a secondary hypertension decision support\u0000tool increases screening for PA in a primary care setting.\u0000Materials and Methods: 153 primary care clinics were randomized to receive a secondary hypertension decision support tool with or without\u0000an integrated predictive model between August 2023 and April 2024.\u0000Results: For patients with risk scores in the top 1 percentile, 63/2,896 (2.2%) patients where the alert was displayed in model clinics had the\u0000order set launched while 12/1,210 (1.0%) in no model clinics had the order set launched (P = 0.014). 19/2,896 (0.66%) of these highest risk\u0000patients in model clinics had an ARR ordered compared to 0/1,210 (0.0%) patients in no model clinics (P = 0.010). For patients with scores\u0000not in the top 1 percentile, 438/20,493 (2.1%) patients in model clinics had the order set launched compared to 273/17,820 (1.5%) in no model\u0000clinics (P < 0.001). 124/20,493 (0.61%) in model clinics had an ARR ordered compared to 34/17,820 (0.19%) in the no model clinics (P <\u00000.001).\u0000Discussion: The addition of a PA predictive model to secondary hypertension alert displays and triggering criteria along with order set displays\u0000and order preselection criteria results in a statistically and clinically significant increase in screening for PA, a condition that clinicians\u0000insufficiently screen for currently.\u0000Conclusion: Addition of a predictive model for an under-screened condition to traditional clinical decision support may increase screening for\u0000these conditions.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141577415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TLFT: Transfer Learning and Fourier Transform for ECG Classification","authors":"Erick Wang, Sarah Lee","doi":"10.1101/2024.07.09.24310152","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310152","url":null,"abstract":"Electrocardiogram (ECG) provides a non-invasive method for identifying cardiac issues, particularly arrhythmias or irregular heartbeats. In recent years, the fields of artificial intelligence and machine learning have made significant inroads into various healthcare applications, including the development of arrhythmia classifiers using deep learning techniques. However, a persistent challenge in this domain is the limited availability of large, well-annotated ECG datasets, which are crucial for building and evaluating robust machine learning models.\u0000To address this limitation, we propose a novel deep transfer learning framework designed to perform effectively on small training datasets. Our approach involves fine-tuning ResNet-18, a general-purpose image classifier, using the MIT-BIH arrhythmia dataset. This method aims to leverage the power of transfer learning to overcome the constraints of limited data availability.\u0000Furthermore, this paper conducts a critical examination of existing deep learning models in the field of ECG analysis. Our investigation reveals that many of these models suffer from methodological flaws, particularly in terms of data leakage. This issue potentially leads to overly optimistic performance estimates and raises concerns about the reliability and generalizability of these models in real-world clinical applications.\u0000By addressing these challenges, our work contributes to the advancement of more robust and reliable ECG analysis techniques, potentially improving the accuracy and applicability of automated arrhythmia detection in clinical settings.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi
{"title":"Developing a natural language processing system using transformer-based models for adverse drug event detection in electronic health records","authors":"Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi","doi":"10.1101/2024.07.09.24310100","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310100","url":null,"abstract":"Objective:\u0000To develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs).\u0000Materials and Methods:\u0000We fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset of the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. We investigated two data processing methods, window-based and split-based approaches, to find an optimal processing method. We evaluated the generalization capabilities on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs.\u0000Results:\u0000On the n2c2 dataset, the best average macro F-scores of 0.832 and 0.868 were achieved using a 15-word window with PubMedBERT and a 10-chunk split with Clinical-Longformer. On the VUMC dataset, the best average macro F-scores of 0.720 and 0.786 were achieved using a 4-chunk split with PubMedBERT and Clinical-Longformer.\u0000Discussion:\u0000Our study provided a comparative analysis of data processing methods. The fine-tuned transformer models showed good performance for ADE-related tasks. Especially, Clinical-Longformer model with split-based approach had a great potential for practical implementation of ADE detection. While the token limit was crucial, the chunk size also significantly influenced model performance, even when the text length was within the token limit.\u0000Conclusion:\u0000We provided guidance on model development, including data processing methods for ADE detection from clinical notes using transformer-based models. Our results on two datasets indicated that data processing methods and models should be carefully selected based on the type of clinical notes and the allocation trade-offs of human and computational power in annotation and model fine-tuning.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paul Windisch, Fabio Dennstaedt, Carole Koechli, Robert Foerster, Christina Schroeder, Daniel M. Aebersold, Daniel R. Zwahlen
{"title":"Extracting the Sample Size From Randomized Controlled Trials in Explainable Fashion Using Natural Language Processing","authors":"Paul Windisch, Fabio Dennstaedt, Carole Koechli, Robert Foerster, Christina Schroeder, Daniel M. Aebersold, Daniel R. Zwahlen","doi":"10.1101/2024.07.09.24310155","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310155","url":null,"abstract":"Background: Extracting the sample size from randomized controlled trials (RCTs) remains a challenge to developing better search functionalities or automating systematic reviews. Most current approaches rely on the sample size being explicitly mentioned in the abstract. Methods: 847 RCTs from high-impact medical journals were tagged with six different entities that could indicate the sample size. A named entity recognition (NER) model was trained to extract the entities and then deployed on a test set of 150 RCTs. The entities' performance in predicting the actual number of trial participants who were randomized was assessed and possible combinations of the entities were evaluated to create predictive models.\u0000Results: The most accurate model could make predictions for 64.7% of trials in the test set, and the resulting predictions were within 10% of the ground truth in 96.9% of cases. A less strict model could make a prediction for 96.0% of trials, and its predictions were within 10% of the ground truth in 88.2% of cases.\u0000Conclusion: Training a named entity recognition model to predict the sample size from randomized controlled trials is feasible, not only if the sample size is explicitly mentioned but also if the sample size can be calculated, e.g., by adding up the number of patients in each arm.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ongoing and planned Randomized Controlled Trials of AI in medicine: An analysis of Clinicaltrials.gov registration data","authors":"mattia andreoletti, Berkay Senkalfa, Alessandro Blasimme","doi":"10.1101/2024.07.09.24310133","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310133","url":null,"abstract":"The integration of Artificial Intelligence (AI) technologies into clinical practice holds significant promise for revolutionizing healthcare. However, the realization of this potential requires rigorous evaluation and validation of AI applications to ensure their safety, efficacy, and clinical significance. Despite increasing awareness of the need for robust testing, the majority of AI-related Randomized Controlled Trials (RCTs) so far have exhibited notable limitations, impeding the generalizability and proper integration of their findings into clinical settings. To understand whether the field is progressing towards more robust testing, we conducted an analysis of the registration data of ongoing and planned RCTs of AI in medicine available in the Clinicaltrials.gov database. Our analysis highlights several key trends and challenges. Effectively addressing these challenges is essential for advancing the field of medical AI and ensuring its successful integration into clinical practice.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoencoder to Identify Sex-Specific Sub-phenotypes in Alzheimer's Disease Progression Using Longitudinal Electronic Health Records","authors":"Weimin Meng, Jie Xu, Yu Huang, Cankun Wang, Qianqian Song, Anjun Ma, Lixin Song, Jiang Bian, Qin Ma, Rui Yin","doi":"10.1101/2024.07.07.24310055","DOIUrl":"https://doi.org/10.1101/2024.07.07.24310055","url":null,"abstract":"Alzheimer's Disease (AD) is a complex neurodegenerative disorder significantly influenced by sex differences, with approximately two-thirds of AD patients being women. Characterizing the sex-specific AD progression and identifying its progression trajectory is a crucial step to developing effective risk stratification and prevention strategies. In this study, we developed an autoencoder to uncover sex-specific sub-phenotypes in AD progression leveraging longitudinal electronic health record (EHR) data from OneFlorida+ Clinical Research Consortium. Specifically, we first constructed temporal patient representation using longitudinal EHRs from sex-stratified AD cohort. We used a long short-term memory (LSTM)-based autoencoder to extract and generate latent representation embeddings from sequential clinical records of patients. We then applied hierarchical agglomerative clustering to the learned representations, grouping patients based on their progression sub-phenotypes. The experimental results show that we successfully identified five primary sex-based AD sub-phenotypes with corresponding progression pathways with high confidence. These sex-specific sub-phenotypes not only illustrated distinct AD progression patterns but also revealed differences in clinical characteristics and comorbidities between females and males in AD development. These findings could provide valuable insights for advancing personalized AD intervention and treatment strategies.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}