Journal of Biomedical Informatics最新文献

筛选
英文 中文
Biomedical document-level relation extraction with thematic capture and localized entity pooling 基于主题捕获和局部实体池的生物医学文档级关系提取。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-12-01 DOI: 10.1016/j.jbi.2024.104756
Yuqing Li, Xinhui Shao
{"title":"Biomedical document-level relation extraction with thematic capture and localized entity pooling","authors":"Yuqing Li,&nbsp;Xinhui Shao","doi":"10.1016/j.jbi.2024.104756","DOIUrl":"10.1016/j.jbi.2024.104756","url":null,"abstract":"<div><div>In contrast to sentence-level relational extraction, document-level relation extraction poses greater challenges as a document typically contains multiple entities, and one entity may be associated with multiple other entities. Existing methods often rely on graph structures to capture path representations between entity pairs. However, this paper introduces a novel approach called local entity pooling that solely relies on the pre-training model to identify the bridge entity related to the current entity pair and generate the reasoning path representation. This technique effectively mitigates the multi-entity problem. Additionally, the model leverages the multi-entity and multi-label characteristics of the document to acquire the document’s thematic representation, thereby enhancing the document-level relation extraction task. Experimental evaluations conducted on two biomedical datasets, CDR and GDA. Our TCLEP (<strong>T</strong>hematic <strong>C</strong>apture and <strong>L</strong>ocalized <strong>E</strong>ntity <strong>P</strong>ooling) model achieved the Macro-F1 scores of 71.7% and 85.3%, respectively. Simultaneously, we incorporated local entity pooling and thematic capture modules into the state-of-the-art model, resulting in performance improvements of 1.5% and 0.2% on the respective datasets. These results highlight the advanced performance of our proposed approach.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104756"},"PeriodicalIF":4.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142769374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Taxonomy-based prompt engineering to generate synthetic drug-related patient portal messages 基于分类学的提示工程,生成合成的药物相关患者门户信息。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-12-01 DOI: 10.1016/j.jbi.2024.104752
Natalie Wang , Sukrit Treewaree , Ayah Zirikly , Yuzhi L. Lu , Michelle H. Nguyen , Bhavik Agarwal , Jash Shah , James Michael Stevenson , Casey Overby Taylor
{"title":"Taxonomy-based prompt engineering to generate synthetic drug-related patient portal messages","authors":"Natalie Wang ,&nbsp;Sukrit Treewaree ,&nbsp;Ayah Zirikly ,&nbsp;Yuzhi L. Lu ,&nbsp;Michelle H. Nguyen ,&nbsp;Bhavik Agarwal ,&nbsp;Jash Shah ,&nbsp;James Michael Stevenson ,&nbsp;Casey Overby Taylor","doi":"10.1016/j.jbi.2024.104752","DOIUrl":"10.1016/j.jbi.2024.104752","url":null,"abstract":"<div><h3>Objective:</h3><div>The objectives of this study were to: (1) create a corpus of synthetic drug-related patient portal messages to address the current lack of publicly available datasets for model development, (2) assess differences in language used and linguistics among the synthetic patient portal messages, and (3) assess the accuracy of patient-reported drug side effects for different racial groups.</div></div><div><h3>Methods:</h3><div>We leveraged a taxonomy for patient- and clinician-generated content to guide prompt engineering for synthetic drug-related patient portal messages. We generated two groups of messages: the first group (200 messages) used a subset of the taxonomy relevant to a broad range of drug-related messages and the second group (250 messages) used a subset of the taxonomy relevant to a narrow range of messages focused on side effects. Prompts also include one of five racial groups. Next, we assessed linguistic characteristics among message parts (subject, beginning, body, ending) across different prompt specifications (urgency, patient portal taxa, race). We also assessed the performance and frequency of patient-reported side effects across different racial groups and compared to data present in a real world data source (SIDER).</div></div><div><h3>Results:</h3><div>The study generated 450 synthetic patient portal messages, and we assessed linguistic patterns, accuracy of drug-side effect pairs, frequency of pairs compared to real world data. Linguistic analysis revealed variations in language usage and politeness and analysis of positive predictive values identified differences in symptoms reported based on urgency levels and racial groups in the prompt. We also found that low incident SIDER drug-side effect pairs were observed less frequently in our dataset.</div></div><div><h3>Conclusion:</h3><div>This study demonstrates the potential of synthetic patient portal messages as a valuable resource for healthcare research. After creating a corpus of synthetic drug-related patient portal messages, we identified significant language differences and provided evidence that drug-side effect pairs observed in messages are comparable to what is expected in real world settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104752"},"PeriodicalIF":4.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sleep apnea test prediction based on Electronic Health Records 基于电子健康记录的睡眠呼吸暂停测试预测。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-12-01 DOI: 10.1016/j.jbi.2024.104737
Lama Abu Tahoun , Amit Shay Green , Tal Patalon , Yaron Dagan , Robert Moskovitch
{"title":"Sleep apnea test prediction based on Electronic Health Records","authors":"Lama Abu Tahoun ,&nbsp;Amit Shay Green ,&nbsp;Tal Patalon ,&nbsp;Yaron Dagan ,&nbsp;Robert Moskovitch","doi":"10.1016/j.jbi.2024.104737","DOIUrl":"10.1016/j.jbi.2024.104737","url":null,"abstract":"<div><div>The identification of Obstructive Sleep Apnea (OSA) is done by a Polysomnography test which is often done in later ages. Being able to notify potential insured members at earlier ages is desirable. For that, we develop predictive models that rely on Electronic Health Records (EHR) and predict whether a person will go through a sleep apnea test after the age of 50. A major challenge is the variability in EHR records in various insured members over the years, which this study investigates as well in the context of controls matching, and prediction. Since there are many temporal variables, the RankLi method was introduced for temporal variable selection. This approach employs the t-test to calculate a divergence score for each temporal variable between the target classes. We also investigate here the need to consider the number of EHR records, as part of control matching, and whether modeling separately for subgroups according to the number of EHR records is more effective. For each prediction task, we trained 4 different classifiers including 1-CNN, LSTM, Random Forest, and Logistic Regression, on data until the age of 40 or 50, and on several numbers of temporal variables. Using the number of EHR records for control matching was found crucial, and using learning models for subsets of the population according to the number of EHR records they have was found more effective. The deep learning models, particularly the 1-CNN, achieved the highest balanced accuracy and AUC scores in both male and female groups. In the male group, the highest results were also observed at age 50 with 100 temporal variables, resulting in a balanced accuracy of 90% and an AUC of 93%.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104737"},"PeriodicalIF":4.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142568735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structural analysis and intelligent classification of clinical trial eligibility criteria based on deep learning and medical text mining 基于深度学习和医学文本挖掘的临床试验资格标准的结构分析和智能分类。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-12-01 DOI: 10.1016/j.jbi.2024.104753
Yongzhong Han , Qianmin Su , Liang Liu , Ying Li , Jihan Huang
{"title":"Structural analysis and intelligent classification of clinical trial eligibility criteria based on deep learning and medical text mining","authors":"Yongzhong Han ,&nbsp;Qianmin Su ,&nbsp;Liang Liu ,&nbsp;Ying Li ,&nbsp;Jihan Huang","doi":"10.1016/j.jbi.2024.104753","DOIUrl":"10.1016/j.jbi.2024.104753","url":null,"abstract":"<div><h3>Objective:</h3><div>To enhance the efficiency, quality, and innovation capability of clinical trials, this paper introduces a novel model called CTEC-AC (Clinical Trial Eligibility Criteria Automatic Classification), aimed at structuring clinical trial eligibility criteria into computationally explainable classifications.</div></div><div><h3>Methods:</h3><div>We obtained detailed information on the latest 2,500 clinical trials from ClinicalTrials.gov, generating over 20,000 eligibility criteria data entries. To enhance the expressiveness of these criteria, we integrated two powerful methods: ClinicalBERT and MetaMap. The resulting enhanced features were used as input for a hierarchical clustering algorithm. Post-processing included expert validation of the algorithm’s output to ensure the accuracy of the constructed annotated eligibility text corpus. Ultimately, our model was employed to automate the classification of eligibility criteria.</div></div><div><h3>Results:</h3><div>We identified 31 distinct categories to summarize the eligibility criteria written by clinical researchers and uncovered common themes in how these criteria are expressed. Using our automated classification model on a labeled dataset, we achieved a macro-average F1 score of 0.94.</div></div><div><h3>Conclusion:</h3><div>This work can automatically extract structured representations from unstructured eligibility criteria text, significantly advancing the informatization of clinical trials. This, in turn, can significantly enhance the intelligence of automated participant recruitment for clinical researchers.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104753"},"PeriodicalIF":4.0,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142739557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Importance of variables from different time frames for predicting self-harm using health system data 利用医疗系统数据预测自残时不同时间段变量的重要性。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-16 DOI: 10.1016/j.jbi.2024.104750
Charles J. Wolock , Brian D. Williamson , Susan M. Shortreed , Gregory E. Simon , Karen J. Coleman , Rodney Yeargans , Brian K. Ahmedani , Yihe Daida , Frances L. Lynch , Rebecca C. Rossom , Rebecca A. Ziebell , Maricela Cruz , Robert D. Wellman , R. Yates Coley
{"title":"Importance of variables from different time frames for predicting self-harm using health system data","authors":"Charles J. Wolock ,&nbsp;Brian D. Williamson ,&nbsp;Susan M. Shortreed ,&nbsp;Gregory E. Simon ,&nbsp;Karen J. Coleman ,&nbsp;Rodney Yeargans ,&nbsp;Brian K. Ahmedani ,&nbsp;Yihe Daida ,&nbsp;Frances L. Lynch ,&nbsp;Rebecca C. Rossom ,&nbsp;Rebecca A. Ziebell ,&nbsp;Maricela Cruz ,&nbsp;Robert D. Wellman ,&nbsp;R. Yates Coley","doi":"10.1016/j.jbi.2024.104750","DOIUrl":"10.1016/j.jbi.2024.104750","url":null,"abstract":"<div><h3>Objective:</h3><div>Self-harm risk prediction models developed using health system data (electronic health records and insurance claims information) often use patient information from up to several years prior to the index visit when the prediction is made. Measurements from some time periods may not be available for all patients. Using the framework of algorithm-agnostic variable importance, we study the predictive potential of variables corresponding to different time horizons prior to the index visit and demonstrate the application of variable importance techniques in the biomedical informatics setting.</div></div><div><h3>Materials and Methods:</h3><div>We use variable importance to quantify the potential of recent (up to three months before the index visit) and distant (more than one year before the index visit) patient mental health information for predicting self-harm risk using data from seven health systems. We quantify importance as the decrease in predictiveness when the variable set of interest is excluded from the prediction task. We define predictiveness using discriminative metrics: area under the receiver operating characteristic curve (AUC), sensitivity, and positive predictive value.</div></div><div><h3>Results:</h3><div>Mental health predictors corresponding to the three months prior to the index visit show strong signal of importance; in one setting, excluding these variables decreased AUC from 0.85 to 0.77. Predictors corresponding to more distant information were less important.</div></div><div><h3>Discussion:</h3><div>Predictors from the months immediately preceding the index visit are highly important. Implementation of self-harm prediction models may be challenging in settings where recent data are not completely available (e.g., due to lags in insurance claims processing) at the time a prediction is made.</div></div><div><h3>Conclusion:</h3><div>Clinically derived variables from different time frames exhibit varying levels of importance for predicting self-harm. Variable importance analyses can inform whether and how to implement risk prediction models into clinical practice given real-world data limitations. These analyses be applied more broadly in biomedical informatics research to provide insight into general clinical risk prediction tasks.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104750"},"PeriodicalIF":4.0,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142668134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine learning approaches for the discovery of clinical pathways from patient data: A systematic review 从患者数据中发现临床路径的机器学习方法:系统综述。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104746
Lillian Muyama , Antoine Neuraz , Adrien Coulet
{"title":"Machine learning approaches for the discovery of clinical pathways from patient data: A systematic review","authors":"Lillian Muyama ,&nbsp;Antoine Neuraz ,&nbsp;Adrien Coulet","doi":"10.1016/j.jbi.2024.104746","DOIUrl":"10.1016/j.jbi.2024.104746","url":null,"abstract":"<div><h3>Background:</h3><div>Clinical pathways are sequences of events followed during the clinical care of a group of patients who meet pre-defined criteria. They have many applications ranging from healthcare evaluation and optimization to clinical decision support. These pathways can be discovered from existing healthcare data, in particular with machine learning which is a family of methods used to learn patterns from data. This review provides a comprehensive overview of the literature concerning the use of machine learning methods for clinical pathway discovery from patient data.</div></div><div><h3>Methods:</h3><div>Guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) method , we conducted a systematic review of the existing literature. We searched 6 databases, <em>i.e.</em>, ACM Digital Library, ScienceDirect, Web of Science, PubMed, IEEE Xplore, and Scopus spanning from January 2004 to December 2023 using search terms pertinent to clinical pathways and their development. Subsequently, the retrieved papers were analyzed to assess their relevance to the scope of this study.</div></div><div><h3>Results:</h3><div>In total, 131 papers that met the specified inclusion criteria were identified. These papers expressed diverse motivations behind data-driven clinical pathway discovery ranging from knowledge discovery to conformance checking with established clinical guidelines (derived from existing literature and clinical experts). Notably, the predominant methods employed (67.2%, <span><math><mi>n</mi></math></span>=88) involved unsupervised machine learning techniques, such as clustering and process mining.</div></div><div><h3>Conclusions:</h3><div>Relevant clinical pathways can be discovered from patient data using machine learning methods, with the desirable potential to aid clinical decision-making in healthcare. However, to reach this objective, the methods used to discover pathways should be reproducible, and rigorous performance evaluation by clinical experts needs to be conducted for validation.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104746"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering 针对医学视觉问题解答的多目标跨模态自监督视觉语言预训练。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104748
Gang Liu , Jinlong He , Pengfei Li , Zixu Zhao , Shenjun Zhong
{"title":"Cross-Modal self-supervised vision language pre-training with multiple objectives for medical visual question answering","authors":"Gang Liu ,&nbsp;Jinlong He ,&nbsp;Pengfei Li ,&nbsp;Zixu Zhao ,&nbsp;Shenjun Zhong","doi":"10.1016/j.jbi.2024.104748","DOIUrl":"10.1016/j.jbi.2024.104748","url":null,"abstract":"<div><div>Medical Visual Question Answering (VQA) is a task that aims to provide answers to questions about medical images, which utilizes both visual and textual information in the reasoning process. The absence of large-scale annotated medical VQA datasets presents a formidable obstacle to training a medical VQA model from scratch in an end-to-end manner. Existing works have been using image captioning dataset in the pre-training stage and fine-tuning to downstream VQA tasks. Following the same paradigm, we use a collection of public medical image captioning datasets to pre-train multimodality models in a self-supervised setup, and fine-tune to downstream medical VQA tasks. In the work, we propose a method that featured with Cross-Modal pre-training with Multiple Objectives (CMMO), which includes masked image modeling, masked language modeling, image-text matching, and image-text contrastive learning. The proposed method is designed to associate the visual features of medical images with corresponding medical concepts in captions, for learning aligned vision and language feature representations, and multi-modal interactions. The experimental results reveal that our proposed CMMO method outperforms state-of-the-art methods on three public medical VQA datasets, showing absolute improvements of 2.6%, 0.9%, and 4.0% on the VQA-RAD, PathVQA, and SLAKE dataset, respectively. We also conduct comprehensive ablation studies to validate our method, and visualize the attention maps which show a strong interpretability. The code and pre-trained weights will be released at <span><span>https://github.com/pengfeiliHEU/CMMO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104748"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction MultiADE:药物不良事件提取的多领域基准。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-12 DOI: 10.1016/j.jbi.2024.104744
Xiang Dai , Sarvnaz Karimi , Abeed Sarker , Ben Hachey , Cecile Paris
{"title":"MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction","authors":"Xiang Dai ,&nbsp;Sarvnaz Karimi ,&nbsp;Abeed Sarker ,&nbsp;Ben Hachey ,&nbsp;Cecile Paris","doi":"10.1016/j.jbi.2024.104744","DOIUrl":"10.1016/j.jbi.2024.104744","url":null,"abstract":"<div><h3>Objective:</h3><div>Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, and shared tasks have been organised to facilitate active adverse event surveillance. However, most – if not all – datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation – the ability of a machine learning model to perform well on new, unseen domains (text types) – is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that is effective on various <em>types of text</em>, such as scientific literature and social media posts.</div></div><div><h3>Methods:</h3><div>We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named <span>MultiADE</span>. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset—<span>CADECv2</span>, which is an extension of <span>CADEC</span> (Karimi et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines.</div></div><div><h3>Conclusion:</h3><div>Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances.</div><div>The newly created <span>CADECv2</span> and the scripts for building the benchmark are publicly available at CSIRO’s Data Portal (<span><span>https://data.csiro.au/collection/csiro:62387</span><svg><path></path></svg></span>). These resources enable the research community to further information extraction, leading to more effective active adverse drug event surveillance.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104744"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling the phenotypic patterns of hypertension and chronic hypotension 解析高血压和慢性低血压的表型模式。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104743
William W. Stead , Adam Lewis , Nunzia B. Giuse , Annette M. Williams , Italo Biaggioni , Lisa Bastarache
{"title":"Disentangling the phenotypic patterns of hypertension and chronic hypotension","authors":"William W. Stead ,&nbsp;Adam Lewis ,&nbsp;Nunzia B. Giuse ,&nbsp;Annette M. Williams ,&nbsp;Italo Biaggioni ,&nbsp;Lisa Bastarache","doi":"10.1016/j.jbi.2024.104743","DOIUrl":"10.1016/j.jbi.2024.104743","url":null,"abstract":"<div><h3>Objective</h3><div>2017 blood pressure (BP) categories focus on cardiac risk. We hypothesize that studying the balance between mechanisms that increase or decrease BP across the medical phenome will lead to new insights. We devised a classifier that uses BP measures to assign individuals to mutually exclusive categories centered in the upper (Htn), lower (Hotn) and middle (Naf) zones of the BP spectrum; and examined the epidemiologic and phenotypic patterns of these BP-categories.</div></div><div><h3>Methods</h3><div>We classified a cohort of 832,560 deidentified electronic health records by BP-category; compared the frequency of BP-categories and four subtypes of Htn and Hotn by sex and age-decade; visualized the distributions of systolic, diastolic, mean arterial and pulse pressures stratified by BP-category; and ran Phenome-wide Association Studies (PheWAS) for Htn and Hotn. We paired knowledgebases for hypertension and hypotension and computed aggregate knowledgebase status (KB-status) indicating known associations. We assessed alignment of PheWAS results with KB-status for phecodes in the knowledgebase, and paired PheWAS correlations with KB-status to surface phenotypic patterns.</div></div><div><h3>Results</h3><div>BP-categories represent distinct distributions within the multimodal distributions of systolic and diastolic pressure. They are centered in the upper, lower, and middle zones of mean arterial pressure and provide a different signal than pulse pressure. For phecodes in the knowledgebase, 85% of positive correlations align with KB-status. Phenotypic patterns for Htn and Hotn overlap for several phecodes and are separate for others. Our analysis suggests five candidates for hypothesis testing research, two where the prevalence of the association with Htn or Hotn may be under appreciated, three where mechanisms that increase and decrease blood pressure may be affecting one another’s expression.</div></div><div><h3>Conclusion</h3><div>PairedPheWAS methods may open a phenome-wide path to disentangling hypertension and chronic hypotension. Our classifier provides a starting point for assigning individuals to BP-categories representing the upper, lower, and middle zones of the BP spectrum. 4.7 % of individuals matching 2017 BP categories for normal, elevated BP or isolated hypertension, have diastolic pressure &lt; 60. Research is needed to fine-tune the classifier, provide external validation, evaluate the clinical significance of diastolic pressure &lt; 60, and test the candidate hypotheses.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104743"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension 机器阅读理解下基于演示的生物医学命名实体识别学习
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-11-01 DOI: 10.1016/j.jbi.2024.104739
Leilei Su , Jian Chen , Yifan Peng , Cong Sun
{"title":"Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension","authors":"Leilei Su ,&nbsp;Jian Chen ,&nbsp;Yifan Peng ,&nbsp;Cong Sun","doi":"10.1016/j.jbi.2024.104739","DOIUrl":"10.1016/j.jbi.2024.104739","url":null,"abstract":"<div><h3>Objective:</h3><div>Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model’s capability to recognize biomedical entities in scenarios of few-shot learning.</div></div><div><h3>Methods:</h3><div>By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA.</div></div><div><h3>Results:</h3><div>We examined the models’ efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively.</div></div><div><h3>Conclusion:</h3><div>We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104739"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信