Jihye Kim Scroggins, Ismael I Hulchafo, Sarah Harkins, Danielle Scharp, Hans Moen, Anahita Davoudi, Kenrick Cato, Michele Tadiello, Maxim Topaz, Veronica Barcelona
{"title":"Identifying stigmatizing and positive/preferred language in obstetric clinical notes using natural language processing.","authors":"Jihye Kim Scroggins, Ismael I Hulchafo, Sarah Harkins, Danielle Scharp, Hans Moen, Anahita Davoudi, Kenrick Cato, Michele Tadiello, Maxim Topaz, Veronica Barcelona","doi":"10.1093/jamia/ocae290","DOIUrl":"https://doi.org/10.1093/jamia/ocae290","url":null,"abstract":"<p><strong>Objective: </strong>To identify stigmatizing language in obstetric clinical notes using natural language processing (NLP).</p><p><strong>Materials and methods: </strong>We analyzed electronic health records from birth admissions in the Northeast United States in 2017. We annotated 1771 clinical notes to generate the initial gold standard dataset. Annotators labeled for exemplars of 5 stigmatizing and 1 positive/preferred language categories. We used a semantic similarity-based search approach to expand the initial dataset by adding additional exemplars, composing an enhanced dataset. We employed traditional classifiers (Support Vector Machine, Decision Trees, and Random Forest) and a transformer-based model, ClinicalBERT (Bidirectional Encoder Representations from Transformers) and BERT base. Models were trained and validated on initial and enhanced datasets and were tested on enhanced testing dataset.</p><p><strong>Results: </strong>In the initial dataset, we annotated 963 exemplars as stigmatizing or positive/preferred. The most frequently identified category was marginalized language/identities (n = 397, 41%), and the least frequent was questioning patient credibility (n = 51, 5%). After employing a semantic similarity-based search approach, 502 additional exemplars were added, increasing the number of low-frequency categories. All NLP models also showed improved performance, with Decision Trees demonstrating the greatest improvement (21%). ClinicalBERT outperformed other models, with the highest average F1-score of 0.78.</p><p><strong>Discussion: </strong>Clinical BERT seems to most effectively capture the nuanced and context-dependent stigmatizing language found in obstetric clinical notes, demonstrating its potential clinical applications for real-time monitoring and alerts to prevent usages of stigmatizing language use and reduce healthcare bias. Future research should explore stigmatizing language in diverse geographic locations and clinical settings to further contribute to high-quality and equitable perinatal care.</p><p><strong>Conclusion: </strong>ClinicalBERT effectively captures the nuanced stigmatizing language in obstetric clinical notes. Our semantic similarity-based search approach to rapidly extract additional exemplars enhanced the performances while reducing the need for labor-intensive annotation.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yufei Yu, Maxim Edelson, Anh Pham, Jonathan E Pekar, Brian Johnson, Kai Post, Tsung-Ting Kuo
{"title":"Distributed, immutable, and transparent biomedical limited data set request management on multi-capacity network.","authors":"Yufei Yu, Maxim Edelson, Anh Pham, Jonathan E Pekar, Brian Johnson, Kai Post, Tsung-Ting Kuo","doi":"10.1093/jamia/ocae288","DOIUrl":"https://doi.org/10.1093/jamia/ocae288","url":null,"abstract":"<p><strong>Objective: </strong>Our study aimed to expedite data sharing requests of Limited Data Sets (LDS) through the development of a streamlined platform that allows distributed, immutable management of network activities, provides transparent and intuitive auditing of data access history, and systematically evaluated it on a multi-capacity network setting for meaningful efficiency metrics.</p><p><strong>Materials and methods: </strong>We developed a blockchain-based system with six types of smart contracts to automate the LDS sharing process among major stakeholders. Our workflow included metadata initialization, access-request processing, and audit-log querying. We evaluated our system using synthetic data on three machines with varying specifications to emulate real-world scenarios. The data employed included ∼1000 researcher requests and ∼360 000 log queries.</p><p><strong>Results: </strong>On average, it took ∼2.5 s to register and respond to a researcher access request. The average runtime for an audit-log query with non-empty output was ∼3 ms. The runtime metrics at each institution showed general trends affiliated with their computational capacity.</p><p><strong>Discussion: </strong>Our system can reduce the LDS sharing request time from potentially hours to seconds, while enhancing data access transparency in a multi-institutional setting. There were variations in performance across sites that could be attributed to differences in hardware specifications. The performance gains became marginal beyond certain hardware thresholds, pointing to the influence of external factors such as network speeds.</p><p><strong>Conclusion: </strong>Our blockchain-based system can potentially accelerate clinical research by strengthening the data access process, expediting access and delivery of data links, increasing transparency with clear audit trails, and reinforcing trust in medical data management. Our smart contracts are available at: https://github.com/graceyufei/LDS-Request-Management.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laura G Militello, Julie Diiulio, Debbie L Wilson, Khoa A Nguyen, Christopher A Harle, Walid Gellad, Wei-Hsuan Lo-Ciganic
{"title":"Using human factors methods to mitigate bias in artificial intelligence-based clinical decision support.","authors":"Laura G Militello, Julie Diiulio, Debbie L Wilson, Khoa A Nguyen, Christopher A Harle, Walid Gellad, Wei-Hsuan Lo-Ciganic","doi":"10.1093/jamia/ocae291","DOIUrl":"https://doi.org/10.1093/jamia/ocae291","url":null,"abstract":"<p><strong>Objectives: </strong>To highlight the often overlooked role of user interface (UI) design in mitigating bias in artificial intelligence (AI)-based clinical decision support (CDS).</p><p><strong>Materials and methods: </strong>This perspective paper discusses the interdependency between AI-based algorithm development and UI design and proposes strategies for increasing the safety and efficacy of CDS.</p><p><strong>Results: </strong>The role of design in biasing user behavior is well documented in behavioral economics and other disciplines. We offer an example of how UI designs play a role in how bias manifests in our machine learning-based CDS development.</p><p><strong>Discussion: </strong>Much discussion on bias in AI revolves around data quality and algorithm design; less attention is given to how UI design can exacerbate or mitigate limitations of AI-based applications.</p><p><strong>Conclusion: </strong>This work highlights important considerations including the role of UI design in reinforcing/mitigating bias, human factors methods for identifying issues before an application is released, and risk communication strategies.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DySurv: dynamic deep learning model for survival analysis with conditional variational inference.","authors":"Munib Mesinovic, Peter Watkinson, Tingting Zhu","doi":"10.1093/jamia/ocae271","DOIUrl":"https://doi.org/10.1093/jamia/ocae271","url":null,"abstract":"<p><strong>Objective: </strong>Machine learning applications for longitudinal electronic health records often forecast the risk of events at fixed time points, whereas survival analysis achieves dynamic risk prediction by estimating time-to-event distributions. Here, we propose a novel conditional variational autoencoder-based method, DySurv, which uses a combination of static and longitudinal measurements from electronic health records to estimate the individual risk of death dynamically.</p><p><strong>Materials and methods: </strong>DySurv directly estimates the cumulative risk incidence function without making any parametric assumptions on the underlying stochastic process of the time-to-event. We evaluate DySurv on 6 time-to-event benchmark datasets in healthcare, as well as 2 real-world intensive care unit (ICU) electronic health records (EHR) datasets extracted from the eICU Collaborative Research (eICU) and the Medical Information Mart for Intensive Care database (MIMIC-IV).</p><p><strong>Results: </strong>DySurv outperforms other existing statistical and deep learning approaches to time-to-event analysis across concordance and other metrics. It achieves time-dependent concordance of over 60% in the eICU case. It is also over 12% more accurate and 22% more sensitive than in-use ICU scores like Acute Physiology and Chronic Health Evaluation (APACHE) and Sequential Organ Failure Assessment (SOFA) scores. The predictive capacity of DySurv is consistent and the survival estimates remain disentangled across different datasets.</p><p><strong>Discussion: </strong>Our interdisciplinary framework successfully incorporates deep learning, survival analysis, and intensive care to create a novel method for time-to-event prediction from longitudinal health records. We test our method on several held-out test sets from a variety of healthcare datasets and compare it to existing in-use clinical risk scoring benchmarks.</p><p><strong>Conclusion: </strong>While our method leverages non-parametric extensions to deep learning-guided estimations of the survival distribution, further deep learning paradigms could be explored.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142683187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rebecca Schnall, Thomas Foster Scherr, Lisa M Kuhns, Patrick Janulis, Haomiao Jia, Olivia R Wood, Michael Almodovar, Robert Garofalo
{"title":"Efficacy of the mLab App: a randomized clinical trial for increasing HIV testing uptake using mobile technology.","authors":"Rebecca Schnall, Thomas Foster Scherr, Lisa M Kuhns, Patrick Janulis, Haomiao Jia, Olivia R Wood, Michael Almodovar, Robert Garofalo","doi":"10.1093/jamia/ocae261","DOIUrl":"10.1093/jamia/ocae261","url":null,"abstract":"<p><strong>Objective: </strong>To determine the efficacy of the mLab App, a mobile-delivered HIV prevention intervention to increase HIV self-testing in MSM and TGW.</p><p><strong>Materials and methods: </strong>This was a randomized (2:2:1) clinical trial of the efficacy the mLab App as compared to standard of care vs mailed home HIV test arm among 525 MSM and TGW aged 18-29 years to increase HIV testing.</p><p><strong>Results: </strong>The mLab App arm participants demonstrated an increase from 35.1% reporting HIV testing in the prior 6 months compared to 88.5% at 6 months. In contrast, 28.8% of control participants reported an HIV test at baseline, which only increased to 65.1% at 6 months. In a generalized linear mixed model estimating this change and controlling for multiple observations of participants, this equated to control participants reporting a 61.2% smaller increase in HIV testing relative to mLab participants (P = .001) at 6 months. This difference was maintained at 12 months with control participants reporting an 82.6% smaller increase relative to mLab App participants (P < .001) from baseline to 12 months.</p><p><strong>Discussion and conclusion: </strong>Findings suggest that the mLab App is well-supported, evidence-based, behavioral risk-reduction intervention for increasing HIV testing rates as compared to the standard of care, suggesting that this may be a useful behavioral risk-reduction intervention for increasing HIV testing among young MSM.</p><p><strong>Trial registration: </strong>This trial was registered with Clinicaltrials.gov NCT03803683.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142669892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica Sperling, Whitney Welsh, Erin Haseley, Stella Quenstedt, Perusi B Muhigaba, Adrian Brown, Patti Ephraim, Tariq Shafi, Michael Waitzkin, David Casarett, Benjamin A Goldstein
{"title":"Machine learning-based prediction models in medical decision-making in kidney disease: patient, caregiver, and clinician perspectives on trust and appropriate use.","authors":"Jessica Sperling, Whitney Welsh, Erin Haseley, Stella Quenstedt, Perusi B Muhigaba, Adrian Brown, Patti Ephraim, Tariq Shafi, Michael Waitzkin, David Casarett, Benjamin A Goldstein","doi":"10.1093/jamia/ocae255","DOIUrl":"10.1093/jamia/ocae255","url":null,"abstract":"<p><strong>Objectives: </strong>This study aims to improve the ethical use of machine learning (ML)-based clinical prediction models (CPMs) in shared decision-making for patients with kidney failure on dialysis. We explore factors that inform acceptability, interpretability, and implementation of ML-based CPMs among multiple constituent groups.</p><p><strong>Materials and methods: </strong>We collected and analyzed qualitative data from focus groups with varied end users, including: dialysis support providers (clinical providers and additional dialysis support providers such as dialysis clinic staff and social workers); patients; patients' caregivers (n = 52).</p><p><strong>Results: </strong>Participants were broadly accepting of ML-based CPMs, but with concerns on data sources, factors included in the model, and accuracy. Use was desired in conjunction with providers' views and explanations. Differences among respondent types were minimal overall but most prevalent in discussions of CPM presentation and model use.</p><p><strong>Discussion and conclusion: </strong>Evidence of acceptability of ML-based CPM usage provides support for ethical use, but numerous specific considerations in acceptability, model construction, and model use for shared clinical decision-making must be considered. There are specific steps that could be taken by data scientists and health systems to engender use that is accepted by end users and facilitates trust, but there are also ongoing barriers or challenges in addressing desires for use. This study contributes to emerging literature on interpretability, mechanisms for sharing complexities, including uncertainty regarding the model results, and implications for decision-making. It examines numerous stakeholder groups including providers, patients, and caregivers to provide specific considerations that can influence health system use and provide a basis for future research.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rubin Baskir, Minnkyong Lee, Sydney J McMaster, Jessica Lee, Faith Blackburne-Proctor, Romuladus Azuine, Nakia Mack, Sheri D Schully, Martin Mendoza, Janeth Sanchez, Yong Crosby, Erica Zumba, Michael Hahn, Naomi Aspaas, Ahmed Elmi, Shanté Alerté, Elizabeth Stewart, Danielle Wilfong, Meag Doherty, Margaret M Farrell, Grace B Hébert, Sula Hood, Cheryl M Thomas, Debra D Murray, Brendan Lee, Louisa A Stark, Megan A Lewis, Jen D Uhrig, Laura R Bartlett, Edgar Gil Rico, Adolph Falcón, Elizabeth Cohn, Mitchell R Lunn, Juno Obedin-Maliver, Linda Cottler, Milton Eder, Fornessa T Randal, Jason Karnes, KiTani Lemieux, Nelson Lemieux, Nelson Lemieux, Lilanta Bradley, Ronnie Tepp, Meredith Wilson, Monica Rodriguez, Chris Lunt, Karriem Watson
{"title":"Research for all: building a diverse researcher community for the All of Us Research Program.","authors":"Rubin Baskir, Minnkyong Lee, Sydney J McMaster, Jessica Lee, Faith Blackburne-Proctor, Romuladus Azuine, Nakia Mack, Sheri D Schully, Martin Mendoza, Janeth Sanchez, Yong Crosby, Erica Zumba, Michael Hahn, Naomi Aspaas, Ahmed Elmi, Shanté Alerté, Elizabeth Stewart, Danielle Wilfong, Meag Doherty, Margaret M Farrell, Grace B Hébert, Sula Hood, Cheryl M Thomas, Debra D Murray, Brendan Lee, Louisa A Stark, Megan A Lewis, Jen D Uhrig, Laura R Bartlett, Edgar Gil Rico, Adolph Falcón, Elizabeth Cohn, Mitchell R Lunn, Juno Obedin-Maliver, Linda Cottler, Milton Eder, Fornessa T Randal, Jason Karnes, KiTani Lemieux, Nelson Lemieux, Nelson Lemieux, Lilanta Bradley, Ronnie Tepp, Meredith Wilson, Monica Rodriguez, Chris Lunt, Karriem Watson","doi":"10.1093/jamia/ocae270","DOIUrl":"10.1093/jamia/ocae270","url":null,"abstract":"<p><strong>Objectives: </strong>The NIH All of Us Research Program (All of Us) is engaging a diverse community of more than 10 000 registered researchers using a robust engagement ecosystem model. We describe strategies used to build an ecosystem that attracts and supports a diverse and inclusive researcher community to use the All of Us dataset and provide metrics on All of Us researcher usage growth.</p><p><strong>Materials and methods: </strong>Researcher audiences and diversity categories were defined to guide a strategy. A researcher engagement strategy was codeveloped with program partners to support a researcher engagement ecosystem. An adapted ecological model guided the ecosystem to address multiple levels of influence to support All of Us data use. Statistics from the All of Us Researcher Workbench demographic survey describe trends in researchers' and institutional use of the Workbench and publication numbers.</p><p><strong>Results: </strong>From 2022 to 2024, some 13 partner organizations and their subawardees conducted outreach, built capacity, or supported researchers and institutions in using the data. Trends indicate that Workbench registrations and use have increased over time, including among researchers underrepresented in the biomedical workforce. Data Use and Registration Agreements from minority-serving institutions also increased.</p><p><strong>Discussion: </strong>All of Us built a diverse, inclusive, and growing research community via intentional engagement with researchers and via partnerships to address systemic data access issues. Future programs will provide additional support to researchers and institutions to ameliorate All of Us data use challenges.</p><p><strong>Conclusion: </strong>The approach described helps address structural inequities in the biomedical research field to advance health equity.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar
{"title":"Learning health system linchpins: information exchange and a common data model.","authors":"Aaron S Eisman, Elizabeth S Chen, Wen-Chih Wu, Karen M Crowley, Dilum P Aluthge, Katherine Brown, Indra Neil Sarkar","doi":"10.1093/jamia/ocae277","DOIUrl":"https://doi.org/10.1093/jamia/ocae277","url":null,"abstract":"<p><strong>Objective: </strong>To demonstrate the potential for a centrally managed health information exchange standardized to a common data model (HIE-CDM) to facilitate semantic data flow needed to support a learning health system (LHS).</p><p><strong>Materials and methods: </strong>The Rhode Island Quality Institute operates the Rhode Island (RI) statewide HIE, which aggregates RI health data for more than half of the state's population from 47 data partners. We standardized HIE data to the Observational Medical Outcomes Partnership (OMOP) CDM. Atherosclerotic cardiovascular disease (ASCVD) risk and primary prevention practices were selected to demonstrate LHS semantic data flow from 2013 to 2023.</p><p><strong>Results: </strong>We calculated longitudinal 10-year ASCVD risk on 62,999 individuals. Nearly two-thirds had ASCVD risk factors from more than one data partner. This enabled granular tracking of individual ASCVD risk, primary prevention (ie, statin therapy), and incident disease. The population was on statins for fewer than half of the guideline-recommended days. We also found that individuals receiving care at Federally Qualified Health Centers were more likely to have unfavorable ASCVD risk profiles and more likely to be on statins. CDM transformation reduced data heterogeneity through a unified health record that adheres to defined terminologies per OMOP domain.</p><p><strong>Discussion: </strong>We demonstrated the potential for an HIE-CDM to enable observational population health research. We also showed how to leverage existing health information technology infrastructure and health data best practices to break down LHS barriers.</p><p><strong>Conclusion: </strong>HIE-CDM facilitates knowledge curation and health system intervention development at the individual, health system, and population levels.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arihant Tripathi, Brett Ecker, Patrick Boland, Saum Ghodoussipour, Gregory R Riedlinger, Subhajyoti De
{"title":"Oncointerpreter.ai enables interactive, personalized summarization of cancer diagnostics data.","authors":"Arihant Tripathi, Brett Ecker, Patrick Boland, Saum Ghodoussipour, Gregory R Riedlinger, Subhajyoti De","doi":"10.1093/jamia/ocae284","DOIUrl":"10.1093/jamia/ocae284","url":null,"abstract":"<p><strong>Objectives: </strong>Cancer diagnosis comes as a shock to many patients, and many of them feel unprepared to handle the complexity of the life-changing event, understand technicalities of the diagnostic reports, and fully engage with the clinical team regarding the personalized clinical decision-making.</p><p><strong>Materials and methods: </strong>We develop Oncointerpreter.ai an interactive resource to offer personalized summarization of clinical cancer genomic and pathological data, and frame questions or address queries about therapeutic opportunities in near-real time via a graphical interface. It is built on the Mistral-7B and Llama-2 7B large language models trained on a local database trained using a large, curated corpus.</p><p><strong>Results: </strong>We showcase its utility with case studies, where Oncointerpreter.ai extracted key clinical and molecular attributes from deidentified pathology and clinical genomics reports, summarized their contextual significance and answered queries on pertinent treatment options. Oncointerpreter also provided personalized summary of currently active clinical trials that match the patients' disease status, their selection criteria, and geographic locations. Benchmarking and comparative assessment indicated that the model responses were generally consistent, and hallucination, ie, factually incorrect or nonsensical response was rare; treatment- and outcome related queries led to context-aware responses, and response time correlated with verbosity.</p><p><strong>Discussion: </strong>The choice of model and domain-specific training also affected the response quality.</p><p><strong>Conclusion: </strong>Oncointerpreter.ai can aid the existing clinical care with interactive, individualized summarization of diagnostics data to promote informed dialogs with the patients with new cancer diagnoses.</p><p><strong>Availability: </strong>https://github.com/Siris2314/Oncointerpreter.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti
{"title":"Comparison of six natural language processing approaches to assessing firearm access in Veterans Health Administration electronic health records.","authors":"Joshua Trujeque, R Adams Dudley, Nathan Mesfin, Nicholas E Ingraham, Isai Ortiz, Ann Bangerter, Anjan Chakraborty, Dalton Schutte, Jeremy Yeung, Ying Liu, Alicia Woodward-Abel, Emma Bromley, Rui Zhang, Lisa A Brenner, Joseph A Simonetti","doi":"10.1093/jamia/ocae169","DOIUrl":"https://doi.org/10.1093/jamia/ocae169","url":null,"abstract":"<p><strong>Objective: </strong>Access to firearms is associated with increased suicide risk. Our aim was to develop a natural language processing approach to characterizing firearm access in clinical records.</p><p><strong>Materials and methods: </strong>We used clinical notes from 36 685 Veterans Health Administration (VHA) patients between April 10, 2023 and April 10, 2024. We expanded preexisting firearm term sets using subject matter experts and generated 250-character snippets around each firearm term appearing in notes. Annotators labeled 3000 snippets into three classes. Using these annotated snippets, we compared four nonneural machine learning models (random forest, bagging, gradient boosting, logistic regression with ridge penalization) and two versions of Bidirectional Encoder Representations from Transformers, or BERT (specifically, BioBERT and Bio-ClinicalBERT) for classifying firearm access as \"definite access\", \"definitely no access\", or \"other\".</p><p><strong>Results: </strong>Firearm terms were identified in 36 685 patient records (41.3%), 33.7% of snippets were categorized as definite access, 9.0% as definitely no access, and 57.2% as \"other\". Among models classifying firearm access, five of six had acceptable performance, with BioBERT and Bio-ClinicalBERT performing best, with F1s of 0.876 (95% confidence interval, 0.874-0.879) and 0.896 (95% confidence interval, 0.894-0.899), respectively.</p><p><strong>Discussion and conclusion: </strong>Firearm-related terminology is common in the clinical records of VHA patients. The ability to use text to identify and characterize patients' firearm access could enhance suicide prevention efforts, and five of our six models could be used to identify patients for clinical interventions.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.7,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142631461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}