Journal of Biomedical Informatics最新文献

筛选
英文 中文
Unsupervised discovery of clinical disease signatures using probabilistic independence 使用概率独立性的临床疾病特征的无监督发现
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-23 DOI: 10.1016/j.jbi.2025.104837
Thomas A. Lasko , William W. Stead , John M. Still , Thomas Z. Li , Michael Kammer , Marco Barbero-Mota , Eric V. Strobl , Bennett A. Landman , Fabien Maldonado
{"title":"Unsupervised discovery of clinical disease signatures using probabilistic independence","authors":"Thomas A. Lasko ,&nbsp;William W. Stead ,&nbsp;John M. Still ,&nbsp;Thomas Z. Li ,&nbsp;Michael Kammer ,&nbsp;Marco Barbero-Mota ,&nbsp;Eric V. Strobl ,&nbsp;Bennett A. Landman ,&nbsp;Fabien Maldonado","doi":"10.1016/j.jbi.2025.104837","DOIUrl":"10.1016/j.jbi.2025.104837","url":null,"abstract":"<div><h3>Objective</h3><div>This study uses probabilistic independence to disentangle patient-specific sources of disease and their signatures in Electronic Health Record (EHR) data.</div></div><div><h3>Materials and Methods</h3><div>We model a disease source as an unobserved root node in the causal graph of observed EHR variables (laboratory test results, medication exposures, billing codes, and demographics), and a signature as the set of downstream effects that a given source has on those observed variables. We used probabilistic independence to infer 2000 sources and their signatures from 9195 variables in <span><math><mrow><mn>630</mn><mo>,</mo><mn>000</mn></mrow></math></span> cross-sectional training instances sampled at random times from 269,099 longitudinal patient records. We evaluated the learned sources by using them to infer and explain the causes of benign vs. malignant pulmonary nodules in 13,252 records, comparing the inferred causes to an external reference list and other medical literature. We compared models trained by three different algorithms and used corresponding models trained directly from the observed variables as baselines.</div></div><div><h3>Results</h3><div>The model recovered 92% of malignant and 30% of benign causes in the reference standard. Of the top 20 inferred causes of malignancy, 14 were not listed in the reference standard, but had supporting evidence in the literature, as did 11 of the top 20 inferred causes of benign nodules. The model decomposed listed malignant causes by an average factor of 5.5 and benign causes by 4.1, with most stratifying by disease course or treatment regimen. Predictive accuracy of causal predictive models trained on source expressions (Random Forest AUC 0.788) was similar to (p = 0.058) their associational baselines (0.738).</div></div><div><h3>Discussion</h3><div>Most of the unrecovered causes were due to the rarity of the condition or lack of sufficient detail in the input data. Surprisingly, the causal model found many patients with apparently undiagnosed cancer as the source of the malignant nodules. Causal model AUC also suggests that some sources remained undiscovered in this cohort.</div></div><div><h3>Conclusion</h3><div>These promising results demonstrate the potential of using probabilistic independence to disentangle complex clinical signatures from noisy, asynchronous, and incomplete EHR data that represent the confluence of multiple simultaneous conditions, and to identify patient-specific causes that support precise treatment decisions.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104837"},"PeriodicalIF":4.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143894732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study 利用自然语言处理来阐明现实世界的临床决策范例:一项概念证明研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104829
Yaniv Alon , Etti Naimi , Chedva Levin , Hila Videl , Mor Saban
{"title":"Leveraging natural language processing to elucidate real-world clinical decision-making paradigms: A proof of concept study","authors":"Yaniv Alon ,&nbsp;Etti Naimi ,&nbsp;Chedva Levin ,&nbsp;Hila Videl ,&nbsp;Mor Saban","doi":"10.1016/j.jbi.2025.104829","DOIUrl":"10.1016/j.jbi.2025.104829","url":null,"abstract":"<div><h3>Background</h3><div>Understanding how clinicians arrive at decisions in actual practice settings is vital for advancing personalized, evidence-based care. However, systematic analysis of qualitative decision data poses challenges.</div></div><div><h3>Methods</h3><div>We analyzed transcribed interviews with Hebrew-speaking clinicians on decision processes using natural language processing (NLP). Word frequency and characterized terminology use, while large language models (ChatGPT from OpenAI and Gemini by Google) identified potential cognitive paradigms.</div></div><div><h3>Results</h3><div>Word frequency analysis of clinician interviews identified experience and knowledge as most influential on decision-making. NLP tentatively recognized heuristics-based reasoning grounded in past cases and intuition as dominant cognitive paradigms. Elements of shared decision-making through individualizing care with patients and families were also observed. Limited Hebrew clinical language resources required developing preliminary lexicons and dynamically adjusting stopwords. Findings also provided preliminary support for heuristics guiding clinical judgment while highlighting needs for broader sampling and enhanced analytical frameworks.</div></div><div><h3>Conclusions</h3><div>This study represents the first use of integrated qualitative and computational methods to systematically elucidate clinical decision-making. Findings supported experience-based heuristics guiding cognition. With methodological enhancements, similar analyses could transform global understanding of tailored care delivery. Standardizing interdisciplinary collaborations on developing NLP tools and analytical frameworks may advance equitable, evidence-based healthcare by elucidating real-world clinical reasoning processes across diverse populations and settings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104829"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal inference for time series datasets with partially overlapping variables 部分重叠变量时间序列数据集的因果推理
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-22 DOI: 10.1016/j.jbi.2025.104828
Louis Adedapo Gomez , Jan Claassen , Samantha Kleinberg
{"title":"Causal inference for time series datasets with partially overlapping variables","authors":"Louis Adedapo Gomez ,&nbsp;Jan Claassen ,&nbsp;Samantha Kleinberg","doi":"10.1016/j.jbi.2025.104828","DOIUrl":"10.1016/j.jbi.2025.104828","url":null,"abstract":"<div><h3>Objective:</h3><div>Healthcare data provides a unique opportunity to learn causal relationships but the largest datasets, such as from hospitals or intensive care units, are often observational and do not standardize variables collected for all patients. Rather, the variables depend on a patient’s health status, treatment plan, and differences between providers. This poses major challenges for causal inference, which either must restrict analysis to patients with complete data (reducing power) or learn patient-specific models (making it difficult to generalize). While missing variables can lead to confounding, variables absent for one individual are often measured in another.</div></div><div><h3>Methods:</h3><div>We propose a novel method, called Causal Model Combination for Time Series (CMC-TS), to learn causal relationships from time series with partially overlapping variable sets. CMC-TS overcomes errors by specifically leveraging partial overlap between datasets (e.g., patients) to iteratively reconstruct missing variables and correct errors by reweighting inferences using shared information across datasets. We evaluated CMC-TS and compared it to the state of the art on both simulated data and real-world data from stroke patients admitted to a neurological intensive care unit.</div></div><div><h3>Results:</h3><div>On simulated data, CMC-TS had the fewest false discoveries and highest F1-score compared to baselines. On real data from stroke patients in a neurological intensive care unit, we found fewer implausible and more highly ranked plausible causes of a clinically important adverse event.</div></div><div><h3>Conclusion:</h3><div>Our approach may lead to better use of observational healthcare data for causal inference, by enabling causal inference from patient data with partially overlapping variable sets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104828"},"PeriodicalIF":4.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143869243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Defining phenotypes of disease severity for long-term cardiovascular, renal, metabolic, and mental health conditions in primary care electronic health records: A mixed-methods study using the nominal group technique 定义初级保健电子健康记录中长期心血管、肾脏、代谢和精神健康状况的疾病严重程度的表型:一项使用名义组技术的混合方法研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-21 DOI: 10.1016/j.jbi.2025.104831
Jennifer Cooper , Thomas Jackson , Shamil Haroon , Francesca L. Crowe , Eleanor Hathaway , Leah Fitzsimmons , Krishnarajah Nirantharakumar
{"title":"Defining phenotypes of disease severity for long-term cardiovascular, renal, metabolic, and mental health conditions in primary care electronic health records: A mixed-methods study using the nominal group technique","authors":"Jennifer Cooper ,&nbsp;Thomas Jackson ,&nbsp;Shamil Haroon ,&nbsp;Francesca L. Crowe ,&nbsp;Eleanor Hathaway ,&nbsp;Leah Fitzsimmons ,&nbsp;Krishnarajah Nirantharakumar","doi":"10.1016/j.jbi.2025.104831","DOIUrl":"10.1016/j.jbi.2025.104831","url":null,"abstract":"<div><h3>Objective</h3><div>Inclusion of severity measures for long-term conditions (LTC) could improve prediction models for multiple long-term conditions (MLTC) but some severity measures have limited availability in electronic health records (EHR). We aimed to develop consensus on feasible severity phenotypes for nine cardio-renal-metabolic and mental health conditions.</div></div><div><h3>Methods</h3><div>This was a mixed-methods study using novel methodology. From existing literature, we identified potential severity phenotypes and explored feasibility of their use in EHR through analysis of data from 31 randomly selected general practices in the Clinical Practice Research Datalink (CPRD) Aurum database, a large UK-based primary care EHR database. We recruited clinical academic experts to participate in a survey and nominal group technique workshop. Participants used a Likert scale to rate clinical importance and feasibility for each severity phenotype independently (informed by the exploratory analysis). For the optimal severity phenotype (highest combined score) for each condition, adjusted hazard ratios (aHR) of five-year mortality were calculated using Cox regression on the full CPRD database.</div></div><div><h3>Results</h3><div>Fifteen existing severity indexes for nine conditions informed the survey. Eighteen clinical academics participated in the survey, twelve also participated in the workshops. Combined mean scores for clinical importance and feasibility were highest for estimated glomerular filtration rate (eGFR) for chronic kidney disease (CKD) (9.42/10) and for microvascular complications of diabetes (9.08/10). Mortality was higher for each reduction in eGFR stage; Stage 3b aHR 1.42, 95 %CI 1.41–1.44 versus Stage 3a CKD and for each additional microvascular complication of diabetes; one complication aHR 1.44, 95 %CI 1.32–1.57 versus none. Some phenotypes (e.g., aneurysm diameter) were not well recorded within the database and could not feasibly be applied.</div></div><div><h3>Conclusion</h3><div>We developed a methodology for identifying severity phenotypes in EHRs. Severity phenotypes were identified for diabetes (type 1 and 2), ischaemic heart disease, CKD and peripheral vascular disease. Data quality in EHR should be improved for under-recorded severity measures.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104831"},"PeriodicalIF":4.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143877475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICPPNet: A semantic segmentation network model based on inter-class positional prior for scoliosis reconstruction in ultrasound images 基于类间位置先验的超声图像脊柱侧凸重建语义分割网络模型
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-19 DOI: 10.1016/j.jbi.2025.104827
Changlong Wang , You Zhou , Yuanshu Li , Wei Pang , Liupu Wang , Wei Du , Hui Yang , Ying Jin
{"title":"ICPPNet: A semantic segmentation network model based on inter-class positional prior for scoliosis reconstruction in ultrasound images","authors":"Changlong Wang ,&nbsp;You Zhou ,&nbsp;Yuanshu Li ,&nbsp;Wei Pang ,&nbsp;Liupu Wang ,&nbsp;Wei Du ,&nbsp;Hui Yang ,&nbsp;Ying Jin","doi":"10.1016/j.jbi.2025.104827","DOIUrl":"10.1016/j.jbi.2025.104827","url":null,"abstract":"<div><h3>Objective:</h3><div>Considering the radiation hazard of X-ray, safer, more convenient and cost-effective ultrasound methods are gradually becoming new diagnostic approaches for scoliosis. For ultrasound images of spine regions, it is challenging to accurately identify spine regions in images due to relatively small target areas and the presence of a lot of interfering information. Therefore, we developed a novel neural network that incorporates prior knowledge to precisely segment spine regions in ultrasound images.</div></div><div><h3>Materials and methods:</h3><div>We constructed a dataset of ultrasound images of spine regions for semantic segmentation. The dataset contains 3136 images of 30 patients with scoliosis. And we propose a network model (ICPPNet), which fully utilizes inter-class positional prior knowledge by combining an inter-class positional probability heatmap, to achieve accurate segmentation of target areas.</div></div><div><h3>Results:</h3><div>ICPPNet achieved an average Dice similarity coefficient of 70.83<span><math><mtext>%</mtext></math></span> and an average 95<span><math><mtext>%</mtext></math></span> Hausdorff distance of 11.28 mm on the dataset, demonstrating its excellent performance. The average error between the Cobb angle measured by our method and the Cobb angle measured by X-ray images is 1.41 degrees, and the coefficient of determination is 0.9879 with a strong correlation.</div></div><div><h3>Discussion and conclusion:</h3><div>ICPPNet provides a new solution for the medical image segmentation task with positional prior knowledge between target classes. And ICPPNet strongly supports the subsequent reconstruction of spine models using ultrasound images.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104827"},"PeriodicalIF":4.0,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension RoBIn:基于变压器的机器阅读理解偏差风险推理模型
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-16 DOI: 10.1016/j.jbi.2025.104819
Abel Corrêa Dias, Viviane Pereira Moreira, João Luiz Dihl Comba
{"title":"RoBIn: A Transformer-based model for risk of bias inference with machine reading comprehension","authors":"Abel Corrêa Dias,&nbsp;Viviane Pereira Moreira,&nbsp;João Luiz Dihl Comba","doi":"10.1016/j.jbi.2025.104819","DOIUrl":"10.1016/j.jbi.2025.104819","url":null,"abstract":"<div><h3>Objective:</h3><div>Scientific publications are essential for uncovering insights, testing new drugs, and informing healthcare policies. Evaluating the quality of these publications often involves assessing their Risk of Bias (RoB), a task traditionally performed by human reviewers. The goal of this work is to create a dataset and develop models that allow automated RoB assessment in clinical trials.</div></div><div><h3>Methods:</h3><div>We use data from the Cochrane Database of Systematic Reviews (CDSR) as ground truth to label open-access clinical trial publications from PubMed. This process enabled us to develop training and test datasets specifically for machine reading comprehension and RoB inference. Additionally, we created extractive (RoBIn<sup>Ext</sup>) and generative (RoBIn<sup>Gen</sup>) Transformer-based approaches to extract relevant evidence and classify the RoB effectively.</div></div><div><h3>Results:</h3><div>RoBIn was evaluated across various settings and benchmarked against state-of-the-art methods, including large language models (LLMs). In most cases, the best-performing RoBIn variant surpasses traditional machine learning and LLM-based approaches, achieving a AUROC of 0.83.</div></div><div><h3>Conclusion:</h3><div>This work addresses RoB assessment in clinical trials by introducing RoBIn, two Transformer-based models for RoB inference and evidence retrieval, which outperform traditional models and LLMs, demonstrating its potential to improve efficiency and scalability in clinical research evaluation. We also introduce a public dataset that is automatically annotated and can be used to enable future research to enhance automated RoB assessment.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104819"},"PeriodicalIF":4.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies 对特定领域的预训练语言模型进行基准测试,以确定临床研究中方法严谨性的最佳模型
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-15 DOI: 10.1016/j.jbi.2025.104825
Fangwen Zhou , Rick Parrish , Muhammad Afzal , Ashirbani Saha , R. Brian Haynes , Alfonso Iorio , Cynthia Lokker
{"title":"Benchmarking domain-specific pretrained language models to identify the best model for methodological rigor in clinical studies","authors":"Fangwen Zhou ,&nbsp;Rick Parrish ,&nbsp;Muhammad Afzal ,&nbsp;Ashirbani Saha ,&nbsp;R. Brian Haynes ,&nbsp;Alfonso Iorio ,&nbsp;Cynthia Lokker","doi":"10.1016/j.jbi.2025.104825","DOIUrl":"10.1016/j.jbi.2025.104825","url":null,"abstract":"<div><h3>Objective</h3><div>Encoder-only transformer-based language models have shown promise in automating critical appraisal of clinical literature. However, a comprehensive evaluation of the models for classifying the methodological rigor of randomized controlled trials is necessary to identify the more robust ones. This study benchmarks several state-of-the-art transformer-based language models using a diverse set of performance metrics.</div></div><div><h3>Methods</h3><div>Seven transformer-based language models were fine-tuned on the title and abstract of 42,575 articles from 2003 to 2023 in McMaster University’s Premium LiteratUre Service database under different configurations. The studies reported in the articles addressed questions related to treatment, prevention, or quality improvement for which randomized controlled trials are the gold standard with defined criteria for rigorous methods. Models were evaluated on the validation set using 12 schemes and metrics, including optimization for cross-entropy loss, Brier score, AUROC, average precision, sensitivity, specificity, and accuracy, among others. Threshold tuning was performed to optimize threshold-dependent metrics. Models that achieved the best performance in one or more schemes on the validation set were further tested in hold-out and external datasets.</div></div><div><h3>Results</h3><div>A total of 210 models were fine-tuned. Six models achieved top performance in one or more evaluation schemes. Three BioLinkBERT models outperformed others on 8 of the 12 schemes. BioBERT, BiomedBERT, and SciBERT were best on 1, 1 and 2 schemes, respectively. While model performance remained robust on the hold-out test set, it declined in external datasets. Class weight adjustments improved performance in most instances.</div></div><div><h3>Conclusion</h3><div>BioLinkBERT generally outperformed the other models. Using comprehensive evaluation metrics and threshold tuning optimizes model selection for real-world applications. Future work should assess generalizability to other datasets, explore alternate imbalance strategies, and examine training on full-text articles.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104825"},"PeriodicalIF":4.0,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel machine learning-based workflow to capture intra-patient heterogeneity through transcriptional multi-label characterization and clinically relevant classification 一种新的基于机器学习的工作流程,通过转录多标签表征和临床相关分类来捕获患者内部异质性
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-09 DOI: 10.1016/j.jbi.2025.104817
Silvia Cascianelli, Iva Milojkovic, Marco Masseroli
{"title":"A novel machine learning-based workflow to capture intra-patient heterogeneity through transcriptional multi-label characterization and clinically relevant classification","authors":"Silvia Cascianelli,&nbsp;Iva Milojkovic,&nbsp;Marco Masseroli","doi":"10.1016/j.jbi.2025.104817","DOIUrl":"10.1016/j.jbi.2025.104817","url":null,"abstract":"<div><h3>Objectives:</h3><div>Patient classification into specific molecular subtypes is paramount in biomedical research and clinical practice to face complex, heterogeneous diseases. Existing methods, especially for gene expression-based cancer subtyping, often simplify patient molecular portraits, neglecting the potential co-occurrence of traits from multiple subtypes. Yet, recognizing intra-sample heterogeneity is essential for more precise patient characterization and improved personalized treatments.</div></div><div><h3>Methods:</h3><div>We developed a novel computational workflow, named MULTI-STAR, which addresses current limitations and provides tailored solutions for reliable multi-label patient subtyping. MULTI-STAR uses state-of-the-art subtyping methods to obtain promising machine learning-based multi-label classifiers, leveraging gene expression profiles. It modifies standard single-label similarity-based techniques to obtain multi-label patient characterizations. Then, it employs these characterizations to train single-sample predictors using different multi-label strategies and find the best-performing classifiers.</div></div><div><h3>Results:</h3><div>MULTI-STAR classifiers offer advanced multi-label recognition of all the subtypes contributing to the molecular and clinical traits of a patient, also distinguishing the primary from the additional relevant secondary subtype(s). The efficacy was demonstrated by developing multi-label solutions for breast and colorectal cancer subtyping that outperform existing methods in terms of prognostic value, primarily for overall survival predictions, and ability to work on a single sample at a time, as required in clinical practice.</div></div><div><h3>Conclusions:</h3><div>This work emphasizes the importance of moving to multi-label subtyping to capture all the molecular traits of individual patients, considering also previously overlooked secondary assignments and paving the way for improved clinical decision-making processes in diverse heterogeneous disease contexts. Indeed, MULTI-STAR novel, reproducible and generalizable approach provides comprehensive representations of patient inner heterogeneity and clinically relevant insights, contributing to precision medicine and personalized treatments.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104817"},"PeriodicalIF":4.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143816805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study 用于临床笔记表型分类的低成本算法,以加强流行病学监测:案例研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-08 DOI: 10.1016/j.jbi.2025.104795
Javier Petri , Pilar Barcena Barbeira , Martina Pesce , Verónica Xhardez , Rodrigo Laje , Viviana Cotik
{"title":"Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study","authors":"Javier Petri ,&nbsp;Pilar Barcena Barbeira ,&nbsp;Martina Pesce ,&nbsp;Verónica Xhardez ,&nbsp;Rodrigo Laje ,&nbsp;Viviana Cotik","doi":"10.1016/j.jbi.2025.104795","DOIUrl":"10.1016/j.jbi.2025.104795","url":null,"abstract":"<div><h3>Objective:</h3><div>Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues.</div></div><div><h3>Methods:</h3><div>Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers.</div></div><div><h3>Results:</h3><div>The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values <span><math><mrow><mo>≤</mo><mn>3</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>7</mn></mrow></msup></mrow></math></span>). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values <span><math><mrow><mo>≤</mo><mn>1</mn><mo>.</mo><mn>7</mn><mo>×</mo><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>18</mn></mrow></msup></mrow></math></span>.</div></div><div><h3>Conclusion:</h3><div>This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic’s onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"166 ","pages":"Article 104795"},"PeriodicalIF":4.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143833466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction 表格到图像方法的迁移学习:心血管疾病预测的案例研究
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2025-04-08 DOI: 10.1016/j.jbi.2025.104821
Francisco J. Lara-Abelenda , David Chushig-Muzo , Pablo Peiro-Corbacho , Vanesa Gómez-Martínez , Ana M. Wägner , Conceição Granja , Cristina Soguero-Ruiz
{"title":"Transfer learning for a tabular-to-image approach: A case study for cardiovascular disease prediction","authors":"Francisco J. Lara-Abelenda ,&nbsp;David Chushig-Muzo ,&nbsp;Pablo Peiro-Corbacho ,&nbsp;Vanesa Gómez-Martínez ,&nbsp;Ana M. Wägner ,&nbsp;Conceição Granja ,&nbsp;Cristina Soguero-Ruiz","doi":"10.1016/j.jbi.2025.104821","DOIUrl":"10.1016/j.jbi.2025.104821","url":null,"abstract":"<div><h3>Objective:</h3><div>Machine learning (ML) models have been extensively used for tabular data classification but recent works have been developed to transform tabular data into images, aiming to leverage the predictive performance of convolutional neural networks (CNNs). However, most of these approaches fail to convert data with a low number of samples and mixed-type features. This study aims: to evaluate the performance of the tabular-to-image method named low mixed-image generator for tabular data (LM-IGTD); and to assess the effectiveness of transfer learning and fine-tuning for improving predictions on tabular data.</div></div><div><h3>Methods:</h3><div>We employed two public tabular datasets with patients diagnosed with cardiovascular diseases (CVDs): Framingham and Steno. First, both datasets were transformed into images using LM-IGTD. Then, Framingham, which contains a larger set of samples than Steno, is used to train CNN-based models. Finally, we performed transfer learning and fine-tuning using the pre-trained CNN on the Steno dataset to predict CVD risk.</div></div><div><h3>Results:</h3><div>The CNN-based model with transfer learning achieved the highest AUCORC in Steno (0.855), outperforming ML models such as decision trees, K-nearest neighbours, least absolute shrinkage and selection operator (LASSO) support vector machine and TabPFN. This approach improved accuracy by 2% over the best-performing traditional model, TabPFN.</div></div><div><h3>Conclusion:</h3><div>To the best of our knowledge, this is the first study that evaluates the effectiveness of applying transfer learning and fine-tuning to tabular data using tabular-to-image approaches. Through the use of CNNs’ predictive capabilities, our work also advances the diagnosis of CVD by providing a framework for early clinical intervention and decision-making support.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"165 ","pages":"Article 104821"},"PeriodicalIF":4.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信