Yuting Guo , Seyedeh Somayyeh Mousavi , Yao Ge , Madhumita Baskaran , Reza Sameni , Abeed Sarker
{"title":"Leveraging few-shot learning and large language models for analyzing blood pressure variations across biological sex from scientific literature","authors":"Yuting Guo , Seyedeh Somayyeh Mousavi , Yao Ge , Madhumita Baskaran , Reza Sameni , Abeed Sarker","doi":"10.1016/j.compbiomed.2025.111128","DOIUrl":null,"url":null,"abstract":"<div><h3>Background:</h3><div>Current blood pressure (BP) technologies and standards were established decades ago, and these standards are still used worldwide today, often without adjusting BP readings for individual demographic factors such as sex and age. While these standards provide useful guidelines and help identify at-risk patients, they are not fully reliable for diagnosis due to the lack of demographic considerations. This study aims to assess the feasibility of using large language models (LLMs) for the automated extraction of BP-related information from the scientific literature, with a focus on biological sex-based distinctions in BP distributions.</div></div><div><h3>Method:</h3><div>We employed natural language processing (NLP) methods to extract the means and standard deviations of BP values from the literature, distinguishing by biological sex. We developed a Solr-based search engine to retrieve scientific articles containing BP-related keywords and biological sex indicators from PubMed. From the retrieved articles, we created a manually reviewed subset comprising 213 articles including 90 cases that reported BP values based on biological sex. We experimented with one few-shot learning method and two zero-shot LLM-based methods—LLaMA3 and GPT-3.5—to extract the mean and standard deviations of BP values, and the associated biological sex. Based on the automatically-extracted information, we generated heatmaps and contour plots to study the variations of BP values across biological sex.</div></div><div><h3>Results:</h3><div>The inter-annotator agreement (IAA) between the two annotators measured using Cohen’s kappa (McHugh, 2012) was 0.74. The best performing system was LLaMA3 with an F<sub>1</sub> score of 0.85 (<span><math><mo>±</mo></math></span>0.00). The few-shot learning method (DANN) exhibited low performance with an average F<sub>1</sub> score of 0.30 (<span><math><mo>±</mo></math></span>0.01). GPT-3.5 achieved moderate performance with an average F<sub>1</sub> score of 0.67 (<span><math><mo>±</mo></math></span>0.04). The contour plots show that males tend to exhibit higher BP values than females.</div></div><div><h3>Conclusions:</h3><div>Our results demonstrate that LLMs can be reliable in extracting population-level BP and biological sex information from clinical literature. They also outperform traditional low-shot information extraction systems in this context, showcasing their ability to extract BP-related information more accurately and efficiently. By employing LLMs, we provide a scalable framework for analyzing demographic differences in BP and emphasize the broader utility of LLMs in addressing similar challenges in biomedical research.</div></div>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"198 ","pages":"Article 111128"},"PeriodicalIF":6.3000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0010482525014817","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background:
Current blood pressure (BP) technologies and standards were established decades ago, and these standards are still used worldwide today, often without adjusting BP readings for individual demographic factors such as sex and age. While these standards provide useful guidelines and help identify at-risk patients, they are not fully reliable for diagnosis due to the lack of demographic considerations. This study aims to assess the feasibility of using large language models (LLMs) for the automated extraction of BP-related information from the scientific literature, with a focus on biological sex-based distinctions in BP distributions.
Method:
We employed natural language processing (NLP) methods to extract the means and standard deviations of BP values from the literature, distinguishing by biological sex. We developed a Solr-based search engine to retrieve scientific articles containing BP-related keywords and biological sex indicators from PubMed. From the retrieved articles, we created a manually reviewed subset comprising 213 articles including 90 cases that reported BP values based on biological sex. We experimented with one few-shot learning method and two zero-shot LLM-based methods—LLaMA3 and GPT-3.5—to extract the mean and standard deviations of BP values, and the associated biological sex. Based on the automatically-extracted information, we generated heatmaps and contour plots to study the variations of BP values across biological sex.
Results:
The inter-annotator agreement (IAA) between the two annotators measured using Cohen’s kappa (McHugh, 2012) was 0.74. The best performing system was LLaMA3 with an F1 score of 0.85 (0.00). The few-shot learning method (DANN) exhibited low performance with an average F1 score of 0.30 (0.01). GPT-3.5 achieved moderate performance with an average F1 score of 0.67 (0.04). The contour plots show that males tend to exhibit higher BP values than females.
Conclusions:
Our results demonstrate that LLMs can be reliable in extracting population-level BP and biological sex information from clinical literature. They also outperform traditional low-shot information extraction systems in this context, showcasing their ability to extract BP-related information more accurately and efficiently. By employing LLMs, we provide a scalable framework for analyzing demographic differences in BP and emphasize the broader utility of LLMs in addressing similar challenges in biomedical research.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.