JAMIA OpenPub Date : 2025-02-21eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf013
Michael Albrecht, Denton Shanks, Tina Shah, Taina Hudson, Jeffrey Thompson, Tanya Filardi, Kelli Wright, Gregory A Ator, Timothy Ryan Smith
{"title":"Enhancing clinical documentation with ambient artificial intelligence: a quality improvement survey assessing clinician perspectives on work burden, burnout, and job satisfaction.","authors":"Michael Albrecht, Denton Shanks, Tina Shah, Taina Hudson, Jeffrey Thompson, Tanya Filardi, Kelli Wright, Gregory A Ator, Timothy Ryan Smith","doi":"10.1093/jamiaopen/ooaf013","DOIUrl":"10.1093/jamiaopen/ooaf013","url":null,"abstract":"<p><strong>Objective: </strong>This study evaluates the impact of an ambient artificial intelligence (AI) documentation platform on clinicians' perceptions of documentation workflow.</p><p><strong>Materials and methods: </strong>An anonymous pre- and non-anonymous post-implementation survey evaluated ambulatory clinician perceptions on impact of Abridge, an ambient AI documentation platform. Outcomes included clinical documentation burden, work after-hours, clinician burnout, and work satisfaction. Data were analyzed using descriptive statistics and proportional odds logistic regression to compare changes for concordant questions across pre- and post-surveys. Covariate analysis examined effect of specialty type and duration of AI tool usage.</p><p><strong>Results: </strong>Survey response rates were 51.9% (93/181) pre-implementation and 74.4% (99/133) post-implementation. Clinician perception of ease of documentation workflow (OR = 6.91, 95% CI: 3.90-12.56, <i>P</i> <.001) and in completing notes associated with usage of the AI tool (OR = 4.95, 95% CI: 2.87-8.69, <i>P </i><.001) was significantly improved. Most respondents agreed that the AI tool decreased documentation burden, decreased the time spent documenting outside clinical hours, reduced burnout risk, and increased job satisfaction, with 48% agreeing that an additional patient could be seen if needed. Clinician specialty type and number of days using the AI tool did not significantly affect survey responses.</p><p><strong>Discussion: </strong>Clinician experience and efficiency was improved with use of Abridge across a breadth of specialties.</p><p><strong>Conclusion: </strong>An ambient AI documentation platform had tremendous impact on improving clinician experience within a short time frame. Future studies should utilize validated instruments for clinician efficiency and burnout and compare impact across AI platforms.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf013"},"PeriodicalIF":2.5,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11843214/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143484522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empirical study of using radiology reports and images to improve intensive care unit mortality prediction.","authors":"Mingquan Lin, Song Wang, Ying Ding, Lihui Zhao, Fei Wang, Yifan Peng","doi":"10.1093/jamiaopen/ooae137","DOIUrl":"10.1093/jamiaopen/ooae137","url":null,"abstract":"<p><strong>Objectives: </strong>The predictive intensive care unit (ICU) scoring system is crucial for predicting patient outcomes, particularly mortality. Traditional scoring systems rely mainly on structured clinical data from electronic health records, which can overlook important clinical information in narratives and images.</p><p><strong>Materials and methods: </strong>In this work, we build a deep learning-based survival prediction model that utilizes multimodality data for ICU mortality prediction. Four sets of features are investigated: (1) physiological measurements of Simplified Acute Physiology Score (SAPS) II, (2) common thorax diseases predefined by radiologists, (3) bidirectional encoder representations from transformers-based text representations, and (4) chest X-ray image features. The model was evaluated using the Medical Information Mart for Intensive Care IV dataset.</p><p><strong>Results: </strong>Our model achieves an average C-index of 0.7829 (95% CI, 0.7620-0.8038), surpassing the baseline using only SAPS-II features, which had a C-index of 0.7470 (95% CI: 0.7263-0.7676). Ablation studies further demonstrate the contributions of incorporating predefined labels (2.00% improvement), text features (2.44% improvement), and image features (2.82% improvement).</p><p><strong>Discussion and conclusion: </strong>The deep learning model demonstrated superior performance to traditional machine learning methods under the same feature fusion setting for ICU mortality prediction. This study highlights the potential of integrating multimodal data into deep learning models to enhance the accuracy of ICU mortality prediction.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae137"},"PeriodicalIF":2.5,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11841685/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143469420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-19eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf005
Yiming Li, Jianfu Li, Manqi Li, Evan Yu, Danniel Rhee, Muhammad Amith, Lu Tang, Lara S Savas, Licong Cui, Cui Tao
{"title":"VaxBot-HPV: a GPT-based chatbot for answering HPV vaccine-related questions.","authors":"Yiming Li, Jianfu Li, Manqi Li, Evan Yu, Danniel Rhee, Muhammad Amith, Lu Tang, Lara S Savas, Licong Cui, Cui Tao","doi":"10.1093/jamiaopen/ooaf005","DOIUrl":"10.1093/jamiaopen/ooaf005","url":null,"abstract":"<p><strong>Objective: </strong>Human Papillomavirus (HPV) vaccine is an effective measure to prevent and control the diseases caused by HPV. However, widespread misinformation and vaccine hesitancy remain significant barriers to its uptake. This study focuses on the development of VaxBot-HPV, a chatbot aimed at improving health literacy and promoting vaccination uptake by providing information and answering questions about the HPV vaccine.</p><p><strong>Methods: </strong>We constructed the knowledge base (KB) for VaxBot-HPV, which consists of 451 documents from biomedical literature and web sources on the HPV vaccine. We extracted 202 question-answer pairs from the KB and 39 questions generated by GPT-4 for training and testing purposes. To comprehensively understand the capabilities and potential of GPT-based chatbots, 3 models were involved in this study: GPT-3.5, VaxBot-HPV, and GPT-4. The evaluation criteria included answer relevancy and faithfulness.</p><p><strong>Results: </strong>VaxBot-HPV demonstrated superior performance in answer relevancy and faithfulness compared to baselines. For test questions in KB, it achieved an answer relevancy score of 0.85 and a faithfulness score of 0.97. Similarly, it attained scores of 0.85 for answer relevancy and 0.96 for faithfulness on GPT-generated questions.</p><p><strong>Discussion: </strong>VaxBot-HPV demonstrates the effectiveness of fine-tuned large language models in healthcare, outperforming generic GPT models in accuracy and relevance. Fine-tuning mitigates hallucinations and misinformation, ensuring reliable information on HPV vaccination while allowing dynamic and tailored responses. The specific fine-tuning, which includes context in addition to question-answer pairs, enables VaxBot-HPV to provide explanations and reasoning behind its answers, enhancing transparency and user trust.</p><p><strong>Conclusions: </strong>This study underscores the importance of leveraging large language models and fine-tuning techniques in the development of chatbots for healthcare applications, with implications for improving medical education and public health communication.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf005"},"PeriodicalIF":2.5,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837857/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143459111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-10eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf008
Yufeng Zhang, Joseph G Kohne, Katherine Webster, Rebecca Vartanian, Emily Wittrup, Kayvan Najarian
{"title":"AXpert: human expert facilitated privacy-preserving large language models for abdominal X-ray report labeling.","authors":"Yufeng Zhang, Joseph G Kohne, Katherine Webster, Rebecca Vartanian, Emily Wittrup, Kayvan Najarian","doi":"10.1093/jamiaopen/ooaf008","DOIUrl":"10.1093/jamiaopen/ooaf008","url":null,"abstract":"<p><strong>Importance: </strong>The lack of a publicly accessible abdominal X-ray (AXR) dataset has hindered necrotizing enterocolitis (NEC) research. While significant strides have been made in applying natural language processing (NLP) to radiology reports, most efforts have focused on chest radiology. Development of an accurate NLP model to identify features of NEC on abdominal radiograph can support efforts to improve diagnostic accuracy for this and other rare pediatric conditions.</p><p><strong>Objectives: </strong>This study aims to develop privacy-preserving large language models (LLMs) and their distilled version to efficiently annotate pediatric AXR reports.</p><p><strong>Materials and methods: </strong>Utilizing pediatric AXR reports collected from C.S. Mott Children's Hospital, we introduced AXpert in 2 formats: one based on the instruction-fine-tuned 7-B Gemma model, and a distilled version employing a BERT-based model derived from the fine-tuned model to improve inference and fine-tuning efficiency. AXpert aims to detect NEC presence and classify its subtypes-pneumatosis, portal venous gas, and free air.</p><p><strong>Results: </strong>Extensive testing shows that LLMs, including Axpert, outperforms baseline BERT models on all metrics. Specifically, Gemma-7B (F1 score: 0.9 ± 0.015) improves upon BlueBERT by 132% in F1 score for detecting NEC positive samples. The distilled BERT model matches the performance of the LLM labelers and surpasses expert-trained baseline BERT models.</p><p><strong>Discussion: </strong>Our findings highlight the potential of using LLMs for clinical NLP tasks. With minimal expert knowledge injections, LLMs can achieve human-like performance, greatly reducing manual labor. Privacy concerns are alleviated as all models are trained and deployed locally.</p><p><strong>Conclusion: </strong>AXpert demonstrates potential to reduce human labeling efforts while maintaining high accuracy in automating NEC diagnosis with AXR, offering precise image labeling capabilities.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf008"},"PeriodicalIF":2.5,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11809431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143392140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-08eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf003
Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate
{"title":"Automating pharmacovigilance evidence generation: using large language models to produce context-aware structured query language.","authors":"Jeffery L Painter, Venkateswara Rao Chalamalasetti, Raymond Kassekert, Andrew Bate","doi":"10.1093/jamiaopen/ooaf003","DOIUrl":"10.1093/jamiaopen/ooaf003","url":null,"abstract":"<p><strong>Objective: </strong>To enhance the accuracy of information retrieval from pharmacovigilance (PV) databases by employing Large Language Models (LLMs) to convert natural language queries (NLQs) into Structured Query Language (SQL) queries, leveraging a business context document.</p><p><strong>Materials and methods: </strong>We utilized OpenAI's GPT-4 model within a retrieval-augmented generation (RAG) framework, enriched with a business context document, to transform NLQs into executable SQL queries. Each NLQ was presented to the LLM randomly and independently to prevent memorization. The study was conducted in 3 phases, varying query complexity, and assessing the LLM's performance both with and without the business context document.</p><p><strong>Results: </strong>Our approach significantly improved NLQ-to-SQL accuracy, increasing from 8.3% with the database schema alone to 78.3% with the business context document. This enhancement was consistent across low, medium, and high complexity queries, indicating the critical role of contextual knowledge in query generation.</p><p><strong>Discussion: </strong>The integration of a business context document markedly improved the LLM's ability to generate accurate SQL queries (ie, both executable and returning semantically appropriate results). Performance achieved a maximum of 85% when high complexity queries are excluded, suggesting promise for routine deployment.</p><p><strong>Conclusion: </strong>This study presents a novel approach to employing LLMs for safety data retrieval and analysis, demonstrating significant advancements in query generation accuracy. The methodology offers a framework applicable to various data-intensive domains, enhancing the accessibility of information retrieval for non-technical users.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf003"},"PeriodicalIF":2.5,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806702/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143383748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-06eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf001
Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky
{"title":"Increased discoverability of rare disease datasets through knowledge graph integration.","authors":"Ian Braun, Emily Hartley, Daniel Olson, Nicolas Matentzoglu, Kevin Schaper, Ramona Walls, Nicole Vasilevsky","doi":"10.1093/jamiaopen/ooaf001","DOIUrl":"10.1093/jamiaopen/ooaf001","url":null,"abstract":"<p><strong>Objectives: </strong>Demonstrate a methodology for improving discoverability of rare disease datasets by enriching source data with biological associations.</p><p><strong>Materials and methods: </strong>We developed an extension of the Biolink semantic model to incorporate patient data and generated a knowledge graph (KG) comprising patient data and associations between biological entities in an existing KG, leveraging existing mappings and mapping standards.</p><p><strong>Results: </strong>The enriched model of patient data can support a search application that is aware of biological associations and provides a semantic search interface to discover and summarize patient datasets within the broader biological context.</p><p><strong>Discussion and conclusion: </strong>Our methodology enriches datasets with a wealth of additional biological knowledge, improving discoverability. Using condition concepts, we illustrate techniques that could be applied to other entities within source data such as measurements and observations. This work provides a foundational framework for how source data can be modeled to improve accuracy of upstream language models for natural language querying.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf001"},"PeriodicalIF":2.5,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11806703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143383833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-05eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooae151
David Hua, Neysa Petrina, Alan J Sacks, Noel Young, Jin-Gun Cho, Ross Smith, Simon K Poon
{"title":"Towards human-AI collaboration in radiology: a multidimensional evaluation of the acceptability of AI for chest radiograph analysis in supporting pulmonary tuberculosis diagnosis.","authors":"David Hua, Neysa Petrina, Alan J Sacks, Noel Young, Jin-Gun Cho, Ross Smith, Simon K Poon","doi":"10.1093/jamiaopen/ooae151","DOIUrl":"10.1093/jamiaopen/ooae151","url":null,"abstract":"<p><strong>Objective: </strong>Artificial intelligence (AI) technology promises to be a powerful tool in addressing the global health challenges posed by tuberculosis (TB). However, evidence for its real-world impact is lacking, which may hinder safe, responsible adoption. This case study addresses this gap by assessing the technical performance, usability and workflow aspects, and health impact of implementing a commercial AI system (qXR by Qure.ai) to support Australian radiologists in diagnosing pulmonary TB.</p><p><strong>Materials and methods: </strong>A retrospective diagnostic accuracy evaluation was conducted to establish the technical performance of qXR in detecting TB compared to a human radiologist and microbiological reference standard. A qualitative human factors assessment was performed to investigate the user experience and clinical decision-making process of radiologists using qXR. A task productivity analysis was completed to quantify how the radiological screening turnaround time is impacted.</p><p><strong>Results: </strong>qXR displays near-human performance satisfying the World Health Organization's suggested accuracy profile. Radiologists reported high satisfaction with using qXR based on minimal workflow disruptions, respect for their professional autonomy, and limited increases in workload burden despite poor algorithm explainability. qXR delivers considerable productivity gains for normal cases and optimizes resource allocation through redistributing time from normal to abnormal cases.</p><p><strong>Discussion and conclusion: </strong>This study provides preliminary evidence of how an AI system with reasonable diagnostic accuracy and a human-centered user experience can meaningfully augment the TB diagnostic workflow. Future research needs to investigate the impact of AI on clinician accuracy, its relationship with efficiency, and best practices for optimizing the impact of clinician-AI collaboration.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae151"},"PeriodicalIF":2.5,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11796096/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-02-04eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooaf004
Jamie M Faro, Emily Obermiller, Corey Obermiller, Katy E Trinkley, Garth Wright, Rajani S Sadasivam, Kristie L Foley, Sarah L Cutrona, Thomas K Houston
{"title":"Using routinely available electronic health record data elements to develop and validate a digital divide risk score.","authors":"Jamie M Faro, Emily Obermiller, Corey Obermiller, Katy E Trinkley, Garth Wright, Rajani S Sadasivam, Kristie L Foley, Sarah L Cutrona, Thomas K Houston","doi":"10.1093/jamiaopen/ooaf004","DOIUrl":"10.1093/jamiaopen/ooaf004","url":null,"abstract":"<p><strong>Background: </strong>Digital health (patient portals, remote monitoring devices, video visits) is a routine part of health care, though the digital divide may affect access.</p><p><strong>Objectives: </strong>To test and validate an electronic health record (EHR) screening tool to identify patients at risk of the digital divide.</p><p><strong>Materials and methods: </strong>We conducted a retrospective EHR data extraction and cross-sectional survey of participants within 1 health care system. We identified 4 potential digital divide markers from the EHR: (1) mobile phone number, (2) email address, (3) active patient portal, and (4) >2 patient portal logins in the last year. We mailed surveys to patients at higher risk (missing all 4 markers), intermediate risk (missing 1-3 markers), or lower risk (missing no markers). Combining EHR and survey data, we summarized the markers into risk scores and evaluated its association with patients' report of lack of Internet access. Then, we assessed the association of EHR markers and eHealth Literacy Scale survey outcomes.</p><p><strong>Results: </strong>A total of 249 patients (39.4%) completed the survey (53%>65 years, 51% female, 50% minority race, 55% rural/small town residents, 46% private insurance, 45% Medicare). Individually, the 4 EHR markers had high sensitivity (range 81%-95%) and specificity (range 65%-79%) compared with survey responses. The EHR marker-based score (high risk, intermediate risk, low risk) predicted absence of Internet access (receiver operator characteristics <i>c</i>-statistic=0.77). Mean digital health literacy scores significantly decreased as her marker digital divide risk increased (<i>P</i> <.001).</p><p><strong>Discussion: </strong>Each of the four EHR markers (Cell phone, email address, patient portal active, and patient portal actively used) compared with self-report yielded high levels of sensitivity, specificity, and overall accuracy.</p><p><strong>Conclusion: </strong>Using these markers, health care systems could target interventions and implementation strategies to support equitable patient access to digital health.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf004"},"PeriodicalIF":2.5,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11792649/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143190786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging deep learning to detect stance in Spanish tweets on COVID-19 vaccination.","authors":"Guillermo Blanco, Rubén Yáñez Martínez, Anália Lourenço","doi":"10.1093/jamiaopen/ooaf007","DOIUrl":"10.1093/jamiaopen/ooaf007","url":null,"abstract":"<p><strong>Objectives: </strong>The automatic detection of stance on social media is an important task for public health applications, especially in the context of health crises. Unfortunately, existing models are typically trained on English corpora. Considering the benefits of extending research to other widely spoken languages, the goal of this study is to develop stance detection models for social media posts in Spanish.</p><p><strong>Materials and methods: </strong>A corpus of 6170 tweets about COVID-19 vaccination, posted between March 1, 2020 and January 4, 2022, was manually annotated by native speakers. Traditional predictive models were compared with deep learning models to ascertain a baseline performance for the detection of stance in Spanish tweets. The evaluation focused on the ability of multilingual and language-specific embeddings to contextualize the topic of those short texts adequately.</p><p><strong>Results: </strong>The BERT-Multi+BiLSTM combination yielded the best results (macroaveraged F1 and Matthews correlation coefficient scores of 0.86 and 0.79, respectively; interpolated area under the receiver operating curve [AUC] of 0.95 for tweets against vaccination and 0.85 in favor of vaccination and a score of 0.97 for tweets containing no stance information), closely followed by the BETO+BiLSTM and RoBERTa BNE-LSTM Spanish models and the term frequency-inverse document frequency+SVM model (average AUC decrease of 0.01). The main differentiating factor among these models was the ability to predict tweets against vaccination.</p><p><strong>Discussion: </strong>The BERT Multi+BILSTM model outperformed the other models in terms of per class prediction capacity. The main assumption is that language-specific embeddings do not outperform multilingual embeddings or TF-IDF features because of the context of the topic. The inherent context of BERT or RoBERTa embeddings is general. So, these embeddings are not familiar with the slang commonly used on Twitter and, more specifically, during the pandemic.</p><p><strong>Conclusion: </strong>The best performing model detects tweet stance with performance high enough to ensure its usefulness for public health applications, namely awareness campaigns, misinformation detection and other early intervention and prevention actions seeking to improve an individual's well-being based on autoreported experiences and opinions. The dataset and code of the study are available on GitHub.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooaf007"},"PeriodicalIF":2.5,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11854073/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143504472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JAMIA OpenPub Date : 2025-01-22eCollection Date: 2025-02-01DOI: 10.1093/jamiaopen/ooae152
Vaakesan Sundrelingam, Shireen Parimoo, Frances Pogacar, Radha Koppula, Saeha Shin, Chloe Pou-Prom, Surain B Roberts, Amol A Verma, Fahad Razak
{"title":"pyDeid: an improved, fast, flexible, and generalizable rule-based approach for deidentification of free-text medical records.","authors":"Vaakesan Sundrelingam, Shireen Parimoo, Frances Pogacar, Radha Koppula, Saeha Shin, Chloe Pou-Prom, Surain B Roberts, Amol A Verma, Fahad Razak","doi":"10.1093/jamiaopen/ooae152","DOIUrl":"10.1093/jamiaopen/ooae152","url":null,"abstract":"<p><strong>Objectives: </strong>Deidentification of personally identifiable information in free-text clinical data is fundamental to making these data broadly available for research. However, there exist gaps in the deidentification landscape with regard to the functionality and flexibility of extant tools, as well as suboptimal tradeoffs between deidentification accuracy and speed. To address these gaps and tradeoffs, we develop a new Python-based deidentification software, pyDeid.</p><p><strong>Materials and methods: </strong>pyDeid uses a combination of regular expression-based rules, fixed exclusion lists and inclusion lists to deidentify free-text data. Additional configurations of pyDeid include optional named entity recognition and custom name lists. We measure its deidentification performance and speed on 700 admission notes from a Canadian hospital, the publicly available n2c2 benchmark dataset of American discharge notes, as well as a synthetic dataset of artificial intelligence (AI) generated admission notes. We also compare its performance with the Physionet De-identification Software and the popular open-source Philter tool.</p><p><strong>Results: </strong>Different configurations of pyDeid outperformed other tools on various metrics, with a \"best\" accuracy value of 0.988, best precision of 0.889, best recall of 0.950, and best F1 score of 0.904. All configurations of pyDeid were significantly faster than Philter and Physionet De-identification Software, with the fastest deidentification speed of 0.48 s per note.</p><p><strong>Discussion and conclusions: </strong>pyDeid allows the flexibility to prioritize between performance and speed, as well as precision and recall, while addressing some of the gaps in functionality left by other tools. pyDeid is also generalizable to domains outside of clinical data and can be further customized for specific contexts or for particular workflows.</p>","PeriodicalId":36278,"journal":{"name":"JAMIA Open","volume":"8 1","pages":"ooae152"},"PeriodicalIF":2.5,"publicationDate":"2025-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11752853/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}