Phuong Dinh Ngo, Miguel Ángel Tejedor Hernández, Taridzo Chomutare, Andrius Budrionis, Therese Olsen Svenning, Torbjørn Torsvik, Anastasios Lamproudis, Hercules Dalianis
{"title":"Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for <i>International Statistical Classification of Diseases, Tenth Revision,</i> Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study.","authors":"Phuong Dinh Ngo, Miguel Ángel Tejedor Hernández, Taridzo Chomutare, Andrius Budrionis, Therese Olsen Svenning, Torbjørn Torsvik, Anastasios Lamproudis, Hercules Dalianis","doi":"10.2196/66153","DOIUrl":"10.2196/66153","url":null,"abstract":"<p><strong>Background: </strong>Accurately assigning ICD-10 (International Statistical Classification of Diseases, Tenth Revision) codes is critical for clinical documentation, reimbursement processes, epidemiological studies, and health care planning. Manual coding is time-consuming, labor-intensive, and prone to errors, underscoring the need for automated solutions within the Norwegian health care system. Recent advances in natural language processing (NLP) and transformer-based language models have shown promising results in automating ICD (International Classification of Diseases) coding in several languages. However, prior work has focused primarily on English and other high-resource languages, leaving a gap in Norwegian-specific clinical NLP research.</p><p><strong>Objective: </strong>This study introduces 2 versions of NorDeClin-BERT (NorDeClin Bidirectional Encoder Representations from Transformers), domain-specific Norwegian BERT-based models pretrained on a large corpus of Norwegian clinical text to enhance their understanding of medical language. Both models were subsequently fine-tuned to predict ICD-10 diagnosis codes. We aimed to evaluate the impact of domain-specific pretraining and model size on classification performance and to compare NorDeClin-BERT with general-purpose and cross-lingual BERT models in the context of Norwegian ICD-10 coding.</p><p><strong>Methods: </strong>Two versions of NorDeClin-BERT were pretrained on the ClinCode Gastro Corpus, a large-scale dataset comprising 8.8 million deidentified Norwegian clinical notes, to enhance domain-specific language modeling. The base model builds upon NorBERT3-base and was pretrained on a large, relevant subset of the corpus, while the large model builds upon NorBERT3-large and was trained on the full dataset. Both models were benchmarked against SweDeClin-BERT, ScandiBERT, NorBERT3-base, and NorBERT3-large, using standard evaluation metrics: accuracy, precision, recall, and F1-score.</p><p><strong>Results: </strong>The results show that both versions of NorDeClin-BERT outperformed general-purpose Norwegian BERT models and Swedish clinical BERT models in classifying both prevalent and less common ICD-10 codes. Notably, NorDeClin-BERT-large achieved the highest overall performance across evaluation metrics, demonstrating the impact of domain-specific clinical pretraining in Norwegian. These results highlight that domain-specific pretraining on Norwegian clinical text, combined with model capacity, improves ICD-10 classification accuracy compared with general-domain Norwegian models and Swedish models pretrained on clinical text. Furthermore, while Swedish clinical models demonstrated some transferability to Norwegian, their performance remained suboptimal, emphasizing the necessity of Norwegian-specific clinical pretraining.</p><p><strong>Conclusions: </strong>This study highlights the potential of NorDeClin-BERT to improve ICD-10 code classification for the gastroenterology do","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66153"},"PeriodicalIF":2.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12377785/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haya Engelstein, Roni Ramon-Gonen, Avi Sabbag, Eyal Klang, Karin Sudri, Michal Cohen-Shelly, Israel Barbash
{"title":"Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study.","authors":"Haya Engelstein, Roni Ramon-Gonen, Avi Sabbag, Eyal Klang, Karin Sudri, Michal Cohen-Shelly, Israel Barbash","doi":"10.2196/74426","DOIUrl":"10.2196/74426","url":null,"abstract":"<p><strong>Background: </strong>Recent progress has demonstrated the potential of deep learning models in analyzing electrocardiogram (ECG) pathologies. However, this method is intricate, expensive to develop, and designed for specific purposes. Large language models show promise in medical image interpretation, and yet their effectiveness in ECG analysis remains understudied. Generative Pretrained Transformer 4 Omni (GPT-4o), a multimodal artificial intelligence model, capable of processing images and text without task-specific training, may offer an accessible alternative.</p><p><strong>Objective: </strong>This study aimed to evaluate GPT-4o's effectiveness in interpreting 12-lead ECGs, assessing classification accuracy, and exploring methods to enhance its performance.</p><p><strong>Methods: </strong>A total of 6 common ECG diagnoses were evaluated: normal ECG, ST-segment elevation myocardial infarction, atrial fibrillation, right bundle branch block, left bundle branch block, and paced rhythm, with 30 normal ECGs and 10 of each abnormal pattern, totaling 80 cases. Deidentified ECGs were analyzed using OpenAI's GPT-4o. Our study used both zero-shot and few-shot learning methodologies to investigate three main scenarios: (1) ECG image recognition, (2) binary classification of normal versus abnormal ECGs, and (3) multiclass classification into 6 categories.</p><p><strong>Results: </strong>The model excelled in recognizing ECG images, achieving an accuracy of 100%. In the classification of normal or abnormal ECG cases, the few-shot learning approach improved GPT-4o's accuracy by 30% from the baseline, reaching 83% (95% CI 81.8%-84.6%). However, multiclass classification for a specific pathology remained limited, achieving only 41% accuracy.</p><p><strong>Conclusions: </strong>GPT-4o effectively differentiates normal from abnormal ECGs, suggesting its potential as an accessible artificial intelligence-assisted triage tool. Although limited in diagnosing specific cardiac conditions, GPT-4o's capability to interpret ECG images without specialized training highlights its potential for preliminary ECG interpretation in clinical and remote settings.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e74426"},"PeriodicalIF":2.0,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375907/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthias Klee, Byron C Jaeger, Franziska Sikorski, Bernd Löwe, Sebastian Kohlmann
{"title":"Heterogeneity in Effects of Automated Results Feedback After Online Depression Screening: Secondary Machine-Learning Based Analysis of the DISCOVER Trial.","authors":"Matthias Klee, Byron C Jaeger, Franziska Sikorski, Bernd Löwe, Sebastian Kohlmann","doi":"10.2196/70001","DOIUrl":"10.2196/70001","url":null,"abstract":"<p><strong>Background: </strong>Online depression screening tools may increase uptake of evidence-based care and consequently lead to symptom reduction. However, results of the DISCOVER trial suggested no effect of automated results feedback compared with no feedback after online depression screening on depressive symptom reduction six months after screening. Interpersonal variation in symptom representation, health care needs, and treatment preferences may nonetheless have led to differential response to feedback mode on an individual level.</p><p><strong>Objective: </strong>The aim of this study was to examine heterogeneity of treatment effects (HTE), that is, differential responses to two feedback modes (tailored or nontailored) versus no feedback (control) following online depression screening.</p><p><strong>Methods: </strong>We used causal forests, a machine learning method that applies recursive partitioning to estimate conditional average treatment effects (CATEs). In this secondary data analysis of the DISCOVER trial, eligible participants screened positive for at least moderate depression severity but had not been diagnosed or treated for depression in the preceding year. The primary outcome was heterogeneity in depression severity change, over a and six-month follow up period, measured with the Patient Health Questionnaire-9. Analysis comprised exploration of average treatment effects (ATE), HTE, operationalized with the area under the targeting operator characteristic curve (AUTOC), and differences in ATE when allocating feedback based on predicted CATE. We extracted top predictors of depression severity change, given feedback and explored high-CATE covariate profiles. Prior to analysis, data was split into training and test sets (1:1) to minimize the risk of overfitting and evaluate predictions in held-out test data.</p><p><strong>Results: </strong>Data from 946 participants of the DISCOVER trial without missing data were analyzed. We did not detect HTE; no versus nontailored feedback, AUTOC -0.48 (95% CI -1.62 to 0.67, P=.41); no versus tailored feedback, AUTOC 0.06 (95% CI -1.21 to 1.33, P=.93); and no versus any feedback, AUTOC -0.20 (95% CI -1.30 to 0.89, P=.72). There was no evidence of alteration to the ATE in the test set when allocating feedback (tailored or nontailored) based on the predicted CATE. By examining covariate profiles, we observed a potentially detrimental role of control beliefs, given feedback compared with no feedback.</p><p><strong>Conclusions: </strong>We applied causal forests to describe higher-level interactions among a broad range of predictors to detect HTE. In absence of evidence for HTE, treatment prioritization based on trained models did not improve ATEs. We did not find evidence of harm or benefit from providing tailored or nontailored feedback after online depression screening regarding depression severity change after six months. Future studies may test whether screening alone prompts behavioral a","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e70001"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12375799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Personalization of AI Using Personal Foundation Models Can Lead to More Precise Digital Therapeutics.","authors":"Peter Washington","doi":"10.2196/55530","DOIUrl":"10.2196/55530","url":null,"abstract":"<p><p>Digital health interventions often use machine learning (ML) models to make predictions of repeated adverse health events. For example, models may be used to analyze patient data to identify patterns that can anticipate the likelihood of disease exacerbations, enabling timely interventions and personalized treatment plans. However, many digital health applications require the prediction of highly heterogeneous and nuanced health events. The cross-subject variability of these events makes traditional ML approaches, where a single generalized model is trained to classify a particular condition, unlikely to generalize to patients outside of the training set. A natural solution is to train a separate model for each individual or subgroup, essentially overfitting the model to the unique characteristics of the individual without negatively overfitting in terms of the desired prediction task. Such an approach has traditionally required extensive data labels from each individual, a reality that has rendered personalized ML infeasible for precision health care. The recent popularization of self-supervised learning, however, provides a solution to this issue: by pretraining deep learning models on the vast array of unlabeled data streams arising from patient-generated health data, personalized models can be fine-tuned to predict the health outcome of interest with fewer labels than purely supervised approaches, making personalization of deep learning models much more achievable from a practical perspective. This perspective describes the current state-of-the-art in both self-supervised learning and ML personalization for health care as well as growing efforts to combine these two ideas by conducting self-supervised pretraining on an individual's data. However, there are practical challenges that must be addressed in order to fully realize this potential, such as human-computer interaction innovations to ensure consistent labeling practices within a single participant.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e55530"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12411786/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Real-Time Signal-Based Wavelet Long Short-Term Memory Method for Length-of-Stay Prediction for the Intensive Care Unit: Development and Evaluation Study.","authors":"Yiqun Jiang, Qing Li, Wenli Zhang","doi":"10.2196/71247","DOIUrl":"10.2196/71247","url":null,"abstract":"<p><strong>Background: </strong>Efficient allocation of health care resources is essential for long-term hospital operation. Effective intensive care unit (ICU) management is essential for alleviating the financial strain on health care systems. Accurate prediction of length-of-stay in ICUs is vital for optimizing capacity planning and resource allocation, with the challenge of achieving early, real-time predictions.</p><p><strong>Objective: </strong>This study aimed to develop a predictive model, namely wavelet long short-term memory model (WT-LSTM), for ICU length-of-stay using only real-time vital sign data. The model is designed for urgent care settings where demographic and historical patient data or laboratory results may be unavailable; the model leverages real-time inputs to deliver early and accurate ICU length-of-stay predictions.</p><p><strong>Methods: </strong>The proposed model integrates discrete wavelet transformation and long short-term memory (LSTM) neural networks to filter noise from patients' vital sign series and improve length-of-stay prediction accuracy. Model performance was evaluated using the electronic ICU database, focusing on 10 common ICU admission diagnoses in the database.</p><p><strong>Results: </strong>The results demonstrate that WT-LSTM consistently outperforms baseline models, including linear regression, LSTM, and bidirectional long short-term memory, in predicting ICU length-of-stay using vital sign data, achieving significant improvements in mean square error. Specifically, the wavelet transformation component of the model enhances the overall performance of WT-LSTM. Removing this component results in an average decrease of 3.3% in mean square error; such a phenomenon is particularly pronounced in specific patient cohorts. The model's adaptability is highlighted through real-time predictions using only 3-hour, 6-hour, 12-hour, and 24-hour input data. Using only 3 hours of input data, the WT-LSTM model delivers competitive results across the 10 most common ICU admission diagnoses, often outperforming Acute Physiology and Chronic Health Evaluation IV, the leading ICU outcome prediction system currently implemented in clinical practice. WT-LSTM effectively captures patterns from vital signs recorded during the initial hours of a patient's ICU stay, making it a promising tool for early prediction and resource optimization in the ICU.</p><p><strong>Conclusions: </strong>Our proposed WT-LSTM model, based on real-time vital sign data, offers a promising solution for ICU length-of-stay prediction. Its high accuracy and early prediction capabilities hold significant potential for enhancing clinical practice, optimizing resource allocation, and supporting critical clinical and administrative decisions in ICU management.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e71247"},"PeriodicalIF":2.0,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12367335/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning Multi-Modal Melanoma Detection: Algorithm Development and Validation.","authors":"Nithika Vivek, Karthik Ramesh","doi":"10.2196/66561","DOIUrl":"10.2196/66561","url":null,"abstract":"<p><strong>Background: </strong>The visual similarity of melanoma and seborrheic keratosis has made it difficult for older patients with disabilities to know when to seek medical attention, contributing to the metastasis of melanoma.</p><p><strong>Objective: </strong>This study aimed to present a novel multimodal deep learning-based technique to distinguish between melanoma and seborrheic keratosis.</p><p><strong>Methods: </strong>Our strategy is three-fold: (1) use patient image data to train and test three deep learning models using transfer learning (ResNet50, InceptionV3, and VGG16) and one author-designed model, (2) use patient metadata to train and test a deep learning model, and (3) combine the predictions of the image model with the best accuracy and the metadata model, using nonlinear least squares regression to specify ideal weights to each model for a combined prediction.</p><p><strong>Results: </strong>The accuracy of the combined model was 88% (195/221 classified correctly) on test data from the HAM10000 dataset. Model reliability was assessed by visualizing the output activation map of each model and comparing the diagnosis patterns to that of dermatologists. The addition of metadata to the image dataset was key to reducing the false-negative and false-positive rates simultaneously, thereby producing better metrics and improving overall model accuracy.</p><p><strong>Conclusions: </strong>Results from this experiment could be used to eliminate late diagnosis of melanoma via easy access to an app. Future experiments can use text data (subjective data pertaining to how the patient felt over a certain period of time) to allow this model to reflect the real hospital setting to a greater extent.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e66561"},"PeriodicalIF":2.0,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12346184/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144849967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI-Supported Shared Decision-Making (AI-SDM): Conceptual Framework.","authors":"Mohammed As'ad, Nawarh Faran, Hala Joharji","doi":"10.2196/75866","DOIUrl":"10.2196/75866","url":null,"abstract":"<p><strong>Unlabelled: </strong>Shared decision-making is central to patient-centered care but is often hampered by artificial intelligence (AI) systems that focus on technical transparency rather than delivering context-rich, clinically meaningful reasoning. Although AI explainability methods elucidate how decisions are made, they fall short of addressing the \"why\" that supports effective patient-clinician dialogue. To bridge this gap, we introduce artificial intelligence-supported shared decision-making (AI-SDM), a conceptual framework designed to integrate AI-based reasoning into shared decision-making to enhance care quality while preserving patient autonomy. AI-SDM is a structured, multimodel framework that synthesizes predictive modeling, evidence-based recommendations, and generative AI techniques to produce adaptive, context-sensitive explanations. The framework distinguishes conventional AI explainability from AI reasoning-prioritizing the generation of tailored, narrative justifications that inform shared decisions. A hypothetical clinical scenario in stroke management is used to illustrate how AI-SDM facilitates an iterative, triadic deliberation process between health care providers, patients, and AI outputs. This integration is intended to transform raw algorithmic data into actionable insights that directly support the decision-making process without supplanting human judgment.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e75866"},"PeriodicalIF":2.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12331219/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wang-Chuan Juang, Zheng-Xun Cai, Chia-Mei Chen, Zhi-Hong You
{"title":"Assessing Revisit Risk in Emergency Department Patients: Machine Learning Approach.","authors":"Wang-Chuan Juang, Zheng-Xun Cai, Chia-Mei Chen, Zhi-Hong You","doi":"10.2196/74053","DOIUrl":"10.2196/74053","url":null,"abstract":"<p><strong>Background: </strong>Overcrowded emergency rooms might degrade the quality of care and overload the clinic staff. Assessing unscheduled return visits (URVs) to the emergency department (ED) is a quality assurance procedure to identify ED-discharged patients with a high likelihood of bounce-back, to ensure patient safety, and ultimately to reduce medical costs by decreasing the frequency of URVs. The field of machine learning (ML) has evolved considerably in the past decades, and many ML applications have been deployed in various contexts.</p><p><strong>Objective: </strong>This study aims to develop an ML-assisted framework that identifies high-risk patients who may revisit the ED within 72 hours after the initial visit. Furthermore, this study evaluates different ML models, feature sets, and feature encoding methods in order to build an effective prediction model.</p><p><strong>Methods: </strong>This study proposes an ML-assisted system that extracts the features from both structured and unstructured medical data to predict patients who are likely to revisit the ED, where the structured data includes patients' electronic health records, and the unstructured data is their medical notes (subjective, objective, assessment, and plan). A 5-year dataset consisting of 184,687 ED visits, along with 324,111 historical electronic health records and the associated medical notes, was obtained from Kaohsiung Veterans General Hospital, a tertiary medical center in Taiwan, to evaluate the proposed system.</p><p><strong>Results: </strong>The evaluation results indicate that incorporating convolutional neural network-based feature extraction from unstructured ED physician narrative notes, combined with structured vital signs and demographic data, significantly enhances predictive performance. The proposed approach achieves an area under the receiver operating characteristic curve of 0.705 and a recall of 0.718, demonstrating its effectiveness in predicting URVs. These findings highlight the potential of integrating structured and unstructured clinical data to improve predictive accuracy in this context.</p><p><strong>Conclusions: </strong>The study demonstrates that an ML-assisted framework may be applied as a decision support tool to assist ED clinicians in identifying revisiting patients, although the model's performance may not be sufficient for clinic implementation. Given the improvement in the area under the receiver operating characteristic curve, the proposed framework should be further explored as a workable decision support tool to pinpoint ED patients with a high risk of revisit and provide them with appropriate and timely care.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e74053"},"PeriodicalIF":2.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12332214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144801174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training Language Models for Estimating Priority Levels in Ultrasound Examination Waitlists: Algorithm Development and Validation.","authors":"Kanato Masayoshi, Masahiro Hashimoto, Naoki Toda, Hirozumi Mori, Goh Kobayashi, Hasnine Haque, Mizuki So, Masahiro Jinzaki","doi":"10.2196/68020","DOIUrl":"10.2196/68020","url":null,"abstract":"<p><strong>Background: </strong>Ultrasound examinations, while valuable, are time-consuming and often limited in availability. Consequently, many hospitals implement reservation systems; however, these systems typically lack prioritization for examination purposes. Hence, our hospital uses a waitlist system that prioritizes examination requests based on their clinical value when slots become available due to cancellations. This system, however, requires a manual review of examination purposes, which are recorded in free-form text. We hypothesized that artificial intelligence language models could preliminarily estimate the priority of requests before manual reviews.</p><p><strong>Objective: </strong>This study aimed to investigate potential challenges associated with using language models for estimating the priority of medical examination requests and to evaluate the performance of language models in processing Japanese medical texts.</p><p><strong>Methods: </strong>We retrospectively collected ultrasound examination requests from the waitlist system at Keio University Hospital, spanning January 2020 to March 2023. Each request comprised an examination purpose documented by the requesting physician and a 6-tier priority level assigned by a radiologist during the clinical workflow. We fine-tuned JMedRoBERTa, Luke, OpenCalm, and LLaMA2 under two conditions: (1) tuning only the final layer and (2) tuning all layers using either standard backpropagation or low-rank adaptation.</p><p><strong>Results: </strong>We had 2335 and 204 requests in the training and test datasets post cleaning. When only the final layers were tuned, JMedRoBERTa outperformed the other models (Kendall coefficient=0.225). With full fine-tuning, JMedRoBERTa continued to perform best (Kendall coefficient=0.254), though with reduced margins compared with the other models. The radiologist's retrospective re-evaluation yielded a Kendall coefficient of 0.221.</p><p><strong>Conclusions: </strong>Language models can estimate the priority of examination requests with accuracy comparable with that of human radiologists. The fine-tuning results indicate that general-purpose language models can be adapted to domain-specific texts (ie, Japanese medical texts) with sufficient fine-tuning. Further research is required to address priority rank ambiguity, expand the dataset across multiple institutions, and explore more recent language models with potentially higher performance or better suitability for this task.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e68020"},"PeriodicalIF":2.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12325119/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144692629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel
{"title":"Natural Language Processing for Identification of Hospitalized People Who Use Drugs: Cohort Study.","authors":"Taisuke Sato, Emily D Grussing, Ruchi Patel, Jessica Ridgway, Joji Suzuki, Benjamin Sweigart, Robert Miller, Alysse G Wurcel","doi":"10.2196/63147","DOIUrl":"10.2196/63147","url":null,"abstract":"<p><strong>Background: </strong>People who use drugs (PWUD) are at heightened risk of severe injection-related infections. Current research relies on billing codes to identify PWUD-a methodology with suboptimal accuracy that may underestimate the economic, racial, and ethnic diversity of hospitalized PWUD.</p><p><strong>Objective: </strong>The goal of this study is to examine the impact of natural language processing (NLP) on enhancing identification of PWUD in electronic medical records, with a specific focus on determining improved systems of identifying populations who may previously been missed, including people who have low income or those from racially and ethnically minoritized populations.</p><p><strong>Methods: </strong>Health informatics specialists assisted in querying a cohort of likely PWUD hospital admissions at Tufts Medical Center between 2020-2022 using the following criteria: (1) ICD-10 codes indicative of drug use, (2) positive drug toxicology results, (3) prescriptions for medications for opioid use disorder, and (4) applying NLP-detected presence of \"token\" keywords in the electronic medical records likely indicative of the patient being a PWUD. Hospital admissions were split into two groups: highly documented (all four criteria present) and minimally documented (NLP-only). These groups were examined to assess the impact of race, ethnicity, and social vulnerability index. With chart review as the \"gold standard,\" the positive predictive value was calculated.</p><p><strong>Results: </strong>The cohort included 4548 hospitalization admissions, with broad heterogeneity in how people entered the cohort and subcohorts; a total of 288 hospital admissions entered the cohort through NLP token presence alone. NLP demonstrated a 54% positive predictive value, outperforming biomarkers, prescription for medications for opioid use disorder, and ICD codes in identifying hospitalizations of PWUD. Additionally, NLP significantly enhanced these methods when integrated into the identification algorithm. The study also found that people from racially and ethnically minoritized communities and those with lower social vulnerability index were significantly more likely to have lower rates of PWUD-related documentation.</p><p><strong>Conclusions: </strong>NLP proved effective in identifying hospitalizations of PWUD, surpassing traditional methods. While further refinement is needed, NLP shows promising potential in minimizing health care disparities.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"4 ","pages":"e63147"},"PeriodicalIF":2.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12294639/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144664120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}