Journal of Biomedical Informatics最新文献

筛选
英文 中文
Identifying task groupings for multi-task learning using pointwise V-usable information. 使用点v可用信息识别多任务学习的任务分组。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-09-01 Epub Date: 2025-07-16 DOI: 10.1016/j.jbi.2025.104881
Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova
{"title":"Identifying task groupings for multi-task learning using pointwise V-usable information.","authors":"Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova","doi":"10.1016/j.jbi.2025.104881","DOIUrl":"10.1016/j.jbi.2025.104881","url":null,"abstract":"<p><strong>Objective: </strong>Even in the era of Large Language Models (LLMs) which are claimed to be solutions for many tasks, fine-tuning language models remains a core methodology used in deployment for a variety of reasons - computational efficiency and performance maximization among them. Fine-tuning could be single-task or multi-task joint learning where the tasks support each other thus boosting their performance. The success of multi-task learning can depend heavily on which tasks are grouped together. Naively grouping all tasks or a random set of tasks can result in negative transfer, with the multi-task models performing worse than single-task models. Though many efforts have been made to identify task groupings and to measure the relatedness among different tasks, it remains a challenging research topic to define a metric to identify the best task grouping out of a pool of many potential task combinations. We propose such a metric.</p><p><strong>Methods: </strong>We propose a metric of task relatedness based on task difficulty measured by pointwise V-usable information (PVI). PVI is a recently proposed metric to estimate how much usable information a dataset contains given a model. We hypothesize that tasks with not statistically different PVI estimates are similar enough to benefit from the joint learning process. We conduct comprehensive experiments to evaluate the feasibility of this metric for task grouping on 15 NLP datasets in the general, biomedical, and clinical domains. We compare the results of the joint learners against single learners, existing baseline methods, and recent large language models, including Llama and GPT-4.</p><p><strong>Results: </strong>The results show that by grouping tasks with similar PVI estimates, the joint learners yielded competitive results with fewer total parameters, with consistent performance across domains.</p><p><strong>Conclusion: </strong>For domain-specific tasks, finetuned models may remain a preferable option, and the PVI-based method of grouping tasks for multi-task learning could be particularly beneficial. This metric could be wrapped in the overall recipe of fine-tuning language models.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104881"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144667675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital twins in increasing diversity in clinical trials: A systematic review. 数字双胞胎在临床试验中增加多样性:系统回顾。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-09-01 Epub Date: 2025-08-08 DOI: 10.1016/j.jbi.2025.104879
Abigail Tubbs, Enrique Alvarez Vazquez
{"title":"Digital twins in increasing diversity in clinical trials: A systematic review.","authors":"Abigail Tubbs, Enrique Alvarez Vazquez","doi":"10.1016/j.jbi.2025.104879","DOIUrl":"10.1016/j.jbi.2025.104879","url":null,"abstract":"<p><p>The integration of digital twin (DT) technology and artificial intelligence (AI) into clinical trials holds transformative potential for addressing persistent inequities in participant representation. This systematic review evaluates the role of these technologies in improving diversity, particularly in racial, ethnic, gender, age, and socioeconomic dimensions, minimizing bias, and allowing personalized medicine in clinical research settings. Evidence from 90 studies reveals that digital twins offer dynamic simulation capabilities for trial design, while AI facilitates predictive analytics and recruitment optimization. However, implementation remains hindered by fragmented regulatory frameworks, biased datasets, and infrastructural disparities. Ethical concerns,including privacy, consent, and algorithmic opacity, further complicate the deployment. Inclusive data practices identified in the literature include the use of demographically representative training data, participatory data collection frameworks, and equity audits to detect and correct systemic bias. Fairness in AI and DT models is primarily operationalized through group fairness metrics such as demographic parity and equalized odds, along with fairness, aware model training and validation. Key gaps include the lack of global standards, underrepresentation in model training, and challenges in real-world adoption. To overcome these barriers, the review proposes actionable directions: developing inclusive data practices, harmonizing regulatory oversight, and embedding fairness into computational model design. By focusing on diversity as a design principle, AI and DT technologies can support a more equitable and generalizable future for clinical research.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104879"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144816690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corrigendum to “A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses” [J. Biomed. Inform. 2025 Feb;162:104771. https://doi.org/10.1016/j.jbi.2024.104771. Epub 2025 Jan 2. PMID: 39755323] “统一NHS苏格兰实验室数据以实现国家级分析的管道”的勘误表[J]。生物医学。通报。2025年2月;162:104771。https://doi.org/10.1016/j.jbi.2024.104771。Epub 2025年1月2日PMID: 39755323)
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-09-01 DOI: 10.1016/j.jbi.2025.104890
Chuang Gao , Shahzad Mumtaz , Sophie McCall , Katherine O'Sullivan , Mark McGilchrist , Daniel R. Morales , Christopher Hall , Katie Wilde , Charlie Mayor , Pamela Linksted , Kathy Harrison , Christian Cole , Emily Jefferson
{"title":"Corrigendum to “A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses” [J. Biomed. Inform. 2025 Feb;162:104771. https://doi.org/10.1016/j.jbi.2024.104771. Epub 2025 Jan 2. PMID: 39755323]","authors":"Chuang Gao ,&nbsp;Shahzad Mumtaz ,&nbsp;Sophie McCall ,&nbsp;Katherine O'Sullivan ,&nbsp;Mark McGilchrist ,&nbsp;Daniel R. Morales ,&nbsp;Christopher Hall ,&nbsp;Katie Wilde ,&nbsp;Charlie Mayor ,&nbsp;Pamela Linksted ,&nbsp;Kathy Harrison ,&nbsp;Christian Cole ,&nbsp;Emily Jefferson","doi":"10.1016/j.jbi.2025.104890","DOIUrl":"10.1016/j.jbi.2025.104890","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104890"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144996487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning from multiple data sources for decision making in health care 从多个数据源中学习,用于卫生保健决策。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-09-01 DOI: 10.1016/j.jbi.2025.104892
Fabio Stella (Guest Editors), Francesco Calimeri, Mauro Dragoni
{"title":"Learning from multiple data sources for decision making in health care","authors":"Fabio Stella (Guest Editors),&nbsp;Francesco Calimeri,&nbsp;Mauro Dragoni","doi":"10.1016/j.jbi.2025.104892","DOIUrl":"10.1016/j.jbi.2025.104892","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104892"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144862268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care WoundcareVQA:伤口护理的多语言视觉问答基准数据集。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-29 DOI: 10.1016/j.jbi.2025.104888
Wen-wai Yim , Asma Ben Abacha , Robert Doerning , Chia-Yu Chen , Jiaying Xu , Anita Subbarao , Zixuan Yu , Fei Xia , M. Kennedy Hall , Meliha Yetisgen
{"title":"WoundcareVQA: A multilingual visual question answering benchmark dataset for wound care","authors":"Wen-wai Yim ,&nbsp;Asma Ben Abacha ,&nbsp;Robert Doerning ,&nbsp;Chia-Yu Chen ,&nbsp;Jiaying Xu ,&nbsp;Anita Subbarao ,&nbsp;Zixuan Yu ,&nbsp;Fei Xia ,&nbsp;M. Kennedy Hall ,&nbsp;Meliha Yetisgen","doi":"10.1016/j.jbi.2025.104888","DOIUrl":"10.1016/j.jbi.2025.104888","url":null,"abstract":"<div><h3>Objective:</h3><div>Introduce the task of wound care multimodal multilingual visual question answering, provide baseline performances, and identify areas of future study.</div></div><div><h3>Methods:</h3><div>A dataset of wound care multimodal multilingual visual question answering (VQA) was created using consumer health questions asked online. Practicing US medical doctors were tasked with providing metadata and expert responses labels. Several instruct-enabled, multilingual visual question answering models (GPT-4o, Gemini-1.5-Pro, and Qwen-VL) were tested to benchmark performances. Finally, automatic evaluations were tested against domain expert response ratings.</div></div><div><h3>Results:</h3><div>A multilingual dataset of 477 wound care cases, 768 responses, 748 images, 3k structured data labels, 1362 translation instances, and 10k judgments was constructed (<span><span>https://osf.io/xsj5u/</span><svg><path></path></svg></span>). Metadata scores ranged from 0.32–0.78 accuracy depending on classification type; response generation performances 0.06 BLEU, 0.66 BERTScore, 0.45 ROUGE-L in English and 0.12 BLEU, 0.69 BERTScore, and 0.50 ROUGE-L in Chinese.</div></div><div><h3>Conclusion:</h3><div>We construct and explore the tasks of multimodal, multilingual VQA. We hope the work here can inspire further research in wound care metadata classification, VQA response generation, and open response automatic evaluation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104888"},"PeriodicalIF":4.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MedVidDeID: Protecting privacy in clinical encounter video recordings MedVidDeID:在临床遭遇视频记录中保护隐私。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-29 DOI: 10.1016/j.jbi.2025.104901
Sriharsha Mopidevi , Kuk Jin Jang , Basam Alasaly , Sydney Pugh , Jean Park , Ashley Batugo , Sy Hwang , Eric Eaton , Danielle Lee Mowery , Kevin B. Johnson
{"title":"MedVidDeID: Protecting privacy in clinical encounter video recordings","authors":"Sriharsha Mopidevi ,&nbsp;Kuk Jin Jang ,&nbsp;Basam Alasaly ,&nbsp;Sydney Pugh ,&nbsp;Jean Park ,&nbsp;Ashley Batugo ,&nbsp;Sy Hwang ,&nbsp;Eric Eaton ,&nbsp;Danielle Lee Mowery ,&nbsp;Kevin B. Johnson","doi":"10.1016/j.jbi.2025.104901","DOIUrl":"10.1016/j.jbi.2025.104901","url":null,"abstract":"<div><h3>Objective:</h3><div>The increasing use of audio-video (AV) data in healthcare has improved patient care, clinical training, and medical and ethnographic research. However, it has also introduced major challenges in preserving patient-provider privacy due to Protected Health Information (PHI) in such data. Traditional de-identification methods are inadequate for AV data, which can reveal identifiable information such as faces, voices, and environmental details. Our goal was to create a pipeline for de-identifying AV healthcare data that minimized the human effort required to guarantee successful de-identification.</div></div><div><h3>Methods:</h3><div>We combined open-source tools with novel methods and infrastructure into a six-stage pipeline: (1) transcript extraction using WhisperX, (2) transcript de-identification with an adapted PHIlter, (3) audio de-identification through scrubbing, (4) video de-identification using YOLOv11 for pose detection and blurring, (5) recombining de-identified audio and video, and (6) validation and correction via manual quality control (QC). We developed two de-identification strategies to support different tolerances for lossy video images. We evaluated this pipeline using 10 h of simulated clinical AV recordings, comprising nearly 1.1 million video frames and approximately 72,000 words.</div></div><div><h3>Results:</h3><div>In Precision Privacy Preservation (PPP) mode, MedVidDeId achieved a success rate of 50%, while in Greedy Privacy Preservation (GPP) mode, it achieved a 97.5% success rate. Compared to manual methods for a 15 min video segment, the pipeline reduced de-identification time by 26.7% in PPP and 64.2% in GPP modes.</div></div><div><h3>Conclusion:</h3><div>The MedVidDeID pipeline offers a viable, efficient hybrid solution for handling AV healthcare data and privacy preservation. Future work will focus on reducing upstream errors at each stage and minimizing the role of the human in the loop.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104901"},"PeriodicalIF":4.5,"publicationDate":"2025-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering 通过纠错提示工程改进药物不良反应命名实体识别的大型语言模型
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-28 DOI: 10.1016/j.jbi.2025.104893
Yunfei Zhang, Wei Liao
{"title":"Improving large language models for adverse drug reactions named entity recognition via error correction prompt engineering","authors":"Yunfei Zhang,&nbsp;Wei Liao","doi":"10.1016/j.jbi.2025.104893","DOIUrl":"10.1016/j.jbi.2025.104893","url":null,"abstract":"<div><div>The monitoring and analysis of adverse drug reactions (ADRs ) are important for ensuring patient safety and improving treatment outcomes. Accurate identification of drug names, drug components, and ADR entities during named entity recognition (NER) processes is essential for ensuring drug safety and advancing the integration of drug information. Given that existing medical name entity recognition technologies rely on large amounts of manually annotated data for training, they are often less effective when applied to adverse drug reactions due to significant data variability and the high similarity between drug names. This paper proposes a prompt template for ADR that integrates error correction examples. The prompt template includes: 1. Basic prompts with task descriptions, 2. Annotated entity explanations, 3. Annotation guidelines, 4. Annotated samples for few-shot learning, 5. Error correction examples. Additionally, it integrates complex ADR data from the web and constructs a corpus containing three types of entities (drug name, drug components, and adverse drug reactions) using the Begin, Inside, Outside (BIO) annotation method. Finally, we evaluate the effectiveness of each prompt and compare it with the fine-tuned Large Language Model Meta AI (LLaMA) model and the DeepSeek model. Experimental results show that under this prompt template, the F1 score of GPT-3.5 increased from 0.648 to 0.887, and that of GPT-4 increased from 0.757 to 0.921. It is significantly better than the fine-tuned LLaMA model and DeepSeek model. It demonstrates the superiority of the proposed method, and provides a solid foundation for extracting drug-related entity relationships and building knowledge graphs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104893"},"PeriodicalIF":4.5,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review 用于健康预测的机器学习中检测和减轻数据集转移的策略:系统综述
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-26 DOI: 10.1016/j.jbi.2025.104902
Gabriel Ferreira dos Santos Silva , Fabiano Novaes Barcellos Filho , Roberta Moreira Wichmann , Francisco Costa da Silva Junior , Alexandre Dias Porto Chiavegatto Filho
{"title":"Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review","authors":"Gabriel Ferreira dos Santos Silva ,&nbsp;Fabiano Novaes Barcellos Filho ,&nbsp;Roberta Moreira Wichmann ,&nbsp;Francisco Costa da Silva Junior ,&nbsp;Alexandre Dias Porto Chiavegatto Filho","doi":"10.1016/j.jbi.2025.104902","DOIUrl":"10.1016/j.jbi.2025.104902","url":null,"abstract":"<div><h3>Objective</h3><div>This review aims to provide a comprehensive overview of the literature on methods and techniques for identifying and correcting dataset shift in machine learning (ML) applications for health predictions.</div></div><div><h3>Methods</h3><div>A systematic search was conducted across PubMed, IEEE Xplore, Scopus, and Web of Science, targeting articles published between January 1, 2019, and March 15, 2025. earch strings combined terms related to machine learning, healthcare, and dataset shift. A total of 32 studies were included, and were evaluated based on dataset shift types addressed, detection and correction strategies used, algorithmic choices, and reported impacts on model performance.</div></div><div><h3>Results</h3><div>The review identified a wide range of dataset shift types, with temporal shift and concept drift being the most commonly addressed. Model-based monitoring and statistical tests were the most frequent detection strategies, while retraining and feature engineering were the predominant correction approaches. Most methods demonstrate moderate interpretability, computational feasibility, and generalizability. However, a lack of standardized performance metrics and external validations limited the comparability of results across studies.</div></div><div><h3>Conclusion</h3><div>While several promising approaches for managing dataset shift in health-related ML models have been proposed, no single method emerged as broadly generalizable across use cases. The implementation of these techniques in real-world clinical workflows remains limited. Future research should prioritize prospective evaluations, subgroup-specific analyses (e.g., by race, age, or geographic region), and integration into clinical decision-support systems to ensure robust and equitable ML deployment in healthcare settings. A structured summary table and conceptual pipeline diagram are provided to support practical adoption.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104902"},"PeriodicalIF":4.5,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144926250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CeRTS: certainty retrieval token search in large language model clinical information extraction CeRTS:确定性检索令牌搜索在大语言模型临床信息提取中的应用。
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-23 DOI: 10.1016/j.jbi.2025.104900
Lars E. Schimmelpfennig , Kriti Bhattarai , Inez Y. Oh , Jake Lever , Obi L. Griffith , Malachi Griffith , Albert M. Lai , Zachary B. Abrams
{"title":"CeRTS: certainty retrieval token search in large language model clinical information extraction","authors":"Lars E. Schimmelpfennig ,&nbsp;Kriti Bhattarai ,&nbsp;Inez Y. Oh ,&nbsp;Jake Lever ,&nbsp;Obi L. Griffith ,&nbsp;Malachi Griffith ,&nbsp;Albert M. Lai ,&nbsp;Zachary B. Abrams","doi":"10.1016/j.jbi.2025.104900","DOIUrl":"10.1016/j.jbi.2025.104900","url":null,"abstract":"<div><h3>Objective</h3><div>Large language models (LLMs) must effectively communicate their uncertainty to be viable in clinical settings. As such, the need for reliable uncertainty estimation grows increasingly urgent with the expanding use of LLMs for information extraction from electronic health records. Previous token-level uncertainty estimators have only used token probabilities within a single output sequence. Here, by leveraging the constraints of JSON output structure, we instead consider all likely sequences and their respective probabilities to obtain a more robust measure of model confidence. We develop Certainty Retrieval Token Search (CeRTS), a new uncertainty estimator for structured information extraction.</div></div><div><h3>Methods</h3><div>We evaluated CeRTS against a previous gold-standard uncertainty estimator when extracting clinical features from lung cancer discharge summaries across eight open-source LLMs. Calibration (Brier score) and discrimination (AUROC) were used to quantify performance.</div></div><div><h3>Results</h3><div>CeRTS surpassed the previous gold-standard estimator in discriminatory power across every model and achieved better calibration in most cases. CeRTS had the strongest agreement between model confidence and accuracy with Qwen-2.5.</div></div><div><h3>Conclusion</h3><div>CeRTS enhances LLM-based information extraction from unstructured clinical text by assigning well-calibrated confidence scores to each extracted item, providing medical researchers with a quantitative measure of reliability at minimal additional cost. Although its performance was generally robust, CeRTS struggled with DeepSeek-R1, which we attribute to the model’s Chain-of-Thought reasoning steps. Our evaluation focused on clinical data, but CeRTS can be applied to any domain requiring reliable uncertainty estimation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104900"},"PeriodicalIF":4.5,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech 困惑和接近:大型语言模型困惑补充了语义距离度量来检测不连贯的语音
IF 4.5 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-08-21 DOI: 10.1016/j.jbi.2025.104899
Weizhe Xu , Serguei Pakhomov , Patrick Heagerty , Eric Horvitz , Ellen R. Bradley , Josh Woolley , Andrew Campbell , Alex Cohen , Dror Ben-Zeev , Trevor Cohen
{"title":"Perplexity and proximity: Large language model perplexity complements semantic distance metrics for the detection of incoherent speech","authors":"Weizhe Xu ,&nbsp;Serguei Pakhomov ,&nbsp;Patrick Heagerty ,&nbsp;Eric Horvitz ,&nbsp;Ellen R. Bradley ,&nbsp;Josh Woolley ,&nbsp;Andrew Campbell ,&nbsp;Alex Cohen ,&nbsp;Dror Ben-Zeev ,&nbsp;Trevor Cohen","doi":"10.1016/j.jbi.2025.104899","DOIUrl":"10.1016/j.jbi.2025.104899","url":null,"abstract":"<div><h3>Objective</h3><div><em>Semantic coherence</em> in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics.</div></div><div><h3>Method</h3><div>We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity.</div></div><div><h3>Results</h3><div>The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set.</div></div><div><h3>Conclusion</h3><div>We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM’s potential in enhancing automated diagnosis and monitoring of SSDs.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104899"},"PeriodicalIF":4.5,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144908092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信