Journal of Biomedical Informatics最新文献_第4页

BiFDR: Brain-Inspired Federated Diffusion Transformer with Reinforcement for privacy-preserving molecular generation 基于隐私保护分子生成的脑启发联合扩散变压器

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-13 DOI: 10.1016/j.jbi.2025.104910

Hongming Hou , Jing Zhang , Meirun Zhang , Xiucai Ye

{"title":"BiFDR: Brain-Inspired Federated Diffusion Transformer with Reinforcement for privacy-preserving molecular generation","authors":"Hongming Hou , Jing Zhang , Meirun Zhang , Xiucai Ye","doi":"10.1016/j.jbi.2025.104910","DOIUrl":"10.1016/j.jbi.2025.104910","url":null,"abstract":"<div><h3>Objective:</h3><div>Generative drug discovery is hampered by challenges in data privacy and the immense computational cost of SOTA models. To surmount these barriers, we developed Brain-Inspired Federated Diffusion with Reinforcement (BiFDR), a privacy-preserving and resource-efficient framework.</div></div><div><h3>Methods:</h3><div>BiFDR integrates three synergistic modules. A Neuro-inspired Federated Coordinator (NeuroFed) orchestrates secure collaboration via synaptic plasticity-inspired principles, combining server-side pruning with client-side Low-Rank Adaptation (LoRA) and sparse asynchronous updates. A Transformer-based diffusion generator (TransFuse) efficiently creates chemically valid molecules in a compressed latent space using attention mechanisms. Finally, a reinforcement learning agent (T-JORM) steers the generative process towards novel 2D and 3D molecular structures, guided by a multi-faceted, Tanimoto-based reward function.</div></div><div><h3>Results:</h3><div>Benchmarked against baseline models, BiFDR improving the Quantitative Estimate of Drug-likeness by 13.7%, the Molecular-level Structural Information Score by 5.7%, and the Molecular Interaction Analysis Index by 52.3%. The framework also enhanced synthetic feasibility, reflected by a 9.5% reduction in the Synthetic Accessibility Score. Critically, BiFDR substantially strengthened data privacy, achieving a 43.6% reduction in the mutual information metric.</div></div><div><h3>Conclusion:</h3><div>BiFDR establishes an effective and efficient paradigm for generative drug discovery. It consistently produces molecules with superior drug-likeness, structural novelty, and interaction potential. By ensuring synthetic accessibility while rigorously preserving privacy and minimizing computational overhead, BiFDR presents a viable and scalable solution for modern, collaborative drug development pipelines.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104910"},"PeriodicalIF":4.5,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145060682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A machine learning approach for automating review of a RxNorm medication mapping pipeline output 用于自动审查RxNorm药物映射管道输出的机器学习方法

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-11 DOI: 10.1016/j.jbi.2025.104909

Matthias Hüser , John Doole , Vinicius Pinho , Hossein Rouhizadeh , Douglas Teodoro , Ahson Saiyed , Matvey B. Palchuk

{"title":"A machine learning approach for automating review of a RxNorm medication mapping pipeline output","authors":"Matthias Hüser , John Doole , Vinicius Pinho , Hossein Rouhizadeh , Douglas Teodoro , Ahson Saiyed , Matvey B. Palchuk","doi":"10.1016/j.jbi.2025.104909","DOIUrl":"10.1016/j.jbi.2025.104909","url":null,"abstract":"<div><h3>Objective:</h3><div>Medication mapping to standardized terminologies is an important prerequisite for performing analytics on a federated EHR network. TriNetX LLC operates the largest such network in the world.</div></div><div><h3>Methods:</h3><div>Here we report on a novel pipeline, called <span>RxEmbed</span>, for the mapping and binding of local medication descriptions to RxNorm ingredient codes, using LLMs, and automated mapping review using machine learning.</div></div><div><h3>Results:</h3><div>Performance of <span>RxEmbed</span> was assessed in a public data set from France as well as 6 Healthcare Organizations from the TriNetX federated EHR network across the United States and Brazil. On the public data set, <span>RxEmbed</span> outperformed two recently reported LLM-based baselines in terms of recall, and precision of generated mappings. In TriNetX network data, <span>RxEmbed</span> obtained RxNorm mapping recalls of 84%–93%, at a precision of 99.5%–100%.</div></div><div><h3>Conclusion:</h3><div>We built and evaluated a LLM-based medication mapping pipeline, that binds local medication descriptions from EHR systems to RxNorm ingredient codes. The high precision of the pipeline output implies very limited need for human review of the generated mappings.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104909"},"PeriodicalIF":4.5,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145045655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Secondary use of radiological imaging data: Vanderbilt’s ImageVU approach 放射成像数据的二次使用：Vanderbilt的ImageVU方法

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-10 DOI: 10.1016/j.jbi.2025.104905

David S. Smith , Karthik Ramadass , Laura Jones , Jennifer Morse , Daniel Fabbri , Joseph R. Coco , Shunxing Bao , Melissa Basford , Peter J. Embi , Reed A. Omary , John C. Gore , Jill M. Pulley , Bennett A. Landman

{"title":"Secondary use of radiological imaging data: Vanderbilt’s ImageVU approach","authors":"David S. Smith , Karthik Ramadass , Laura Jones , Jennifer Morse , Daniel Fabbri , Joseph R. Coco , Shunxing Bao , Melissa Basford , Peter J. Embi , Reed A. Omary , John C. Gore , Jill M. Pulley , Bennett A. Landman","doi":"10.1016/j.jbi.2025.104905","DOIUrl":"10.1016/j.jbi.2025.104905","url":null,"abstract":"<div><h3>Objective:</h3><div>To develop ImageVU, a scalable research imaging infrastructure that integrates clinical imaging data with metadata-driven cohort discovery, enabling secure, efficient, and regulatory-compliant access to imaging for secondary and opportunistic research use. This manuscript presents a detailed description of ImageVU’s key components and lessons learned to assist other institutions in developing similar research imaging services and infrastructure.</div></div><div><h3>Methods:</h3><div>ImageVU was designed to support the secondary use of radiological imaging data through a dedicated research imaging store. The system comprises four interconnected components: a Research PACS, an Ad Hoc Backfill Host, Cloud Storage System, and a De-Identification System. Imaging metadata are extracted and stored in the Research Derivative (RD), an identified clinical data repository, and the Synthetic Derivative (SD), a de-identified research data repository, with access facilitated through the RD Discover web portal. Researchers interact with the system via structured metadata queries and multiple data delivery options, including web-based viewing, bulk downloads, and dataset preparation for high-performance computing environments.</div></div><div><h3>Results:</h3><div>The integration of metadata-driven search capabilities has streamlined cohort discovery and improved imaging data accessibility. As of December 2024, ImageVU has processed 12.9 million MRI and CT series from 1.36 million studies across 453,403 patients. The system has supported 75 project requests, delivering over 50 TB of imaging data to 55 investigators, leading to 66 published research papers.</div></div><div><h3>Conclusion:</h3><div>ImageVU demonstrates a scalable and efficient approach for integrating clinical imaging into research workflows. By combining institutional data infrastructure with cloud-based storage and metadata-driven cohort identification, the platform enables secure and compliant access to imaging for translational research.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104905"},"PeriodicalIF":4.5,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145045656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comparing clinical decision support systems for improving follow-up of abnormal cervical cancer screening test results 比较临床决策支持系统对改善异常宫颈癌筛查结果随访的作用。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-09 DOI: 10.1016/j.jbi.2025.104908

Steven J. Atlas , Timothy E. Burdick , Adam Wright , Wenyan Zhao , Shoshana Hort , David G. Aman , Mathan Thillaiyapillai , E. John Orav , Amy J. Wint , Rebecca E. Smith , Katherine L. Gallagher , Molly L. Housman , Frank Y. Chang , Courtney J. Diamond , Li Zhou , Jennifer S. Haas , Anna N.A. Tosteson

{"title":"Comparing clinical decision support systems for improving follow-up of abnormal cervical cancer screening test results","authors":"Steven J. Atlas , Timothy E. Burdick , Adam Wright , Wenyan Zhao , Shoshana Hort , David G. Aman , Mathan Thillaiyapillai , E. John Orav , Amy J. Wint , Rebecca E. Smith , Katherine L. Gallagher , Molly L. Housman , Frank Y. Chang , Courtney J. Diamond , Li Zhou , Jennifer S. Haas , Anna N.A. Tosteson","doi":"10.1016/j.jbi.2025.104908","DOIUrl":"10.1016/j.jbi.2025.104908","url":null,"abstract":"<div><h3>Background</h3><div>Many individuals with abnormal cervical cancer screening test results do not receive timely follow-up care. Clinical decision support systems (CDSS) to improve follow-up are challenged by difficulty identifying clinical elements and applying complex guideline recommendations. As part of a multisite trial, two CDSS models were implemented: one used natural language processes to evaluate extracted data outside of the electronic health record (EHR) (System A); the other used commercial EHR functionality using LOINC-defined result fields (System B). This secondary analysis compared the accuracy and trial outcomes among sites using these two CDSS models.</div></div><div><h3>Methods</h3><div>Primary care clinics (32 in System A and 12 in System B) were randomly assigned to usual care, CDSS alone, or CDSS with patient outreach with or without navigation. CDSS identified individuals with overdue abnormal screening results and specified the recommended follow-up and time interval. CDSS accuracy was assessed by manual chart review. Patient outreach consisted of portal/mailed letters plus a single phone call. Navigation included one or more phone calls to address barriers to care. Completion of recommended follow-up at 120 days after enrollment was the primary outcome. Clinic was the unit of randomization, and the patient was the unit of analysis.</div></div><div><h3>Results</h3><div>Between October 2020 and December 2021, 2596 patients with abnormal results were identified by the CDSS. CDSS true positives were 61.3 % in System A and 70.4 % in System B. CDSS alone versus usual care did not improve outcomes in either system. CDSS with patient outreach with or without navigation versus usual care significantly increased follow-up rates in System A (38.2 % or 37.2 % vs 23.5 %, p < 0.001) and System B (25.4 % or 23 % vs. 19.7 %, p = 0.044).</div></div><div><h3>Conclusions</h3><div>Two CDSS models developed to identify overdue abnormal cervical cancer screening test results had moderate accuracy. Both models with patient outreach with or without navigation – but not CDSS alone – increased recommended follow-up. Future CDSS for cervical cancer screening may be improved with open-source tools developed in public–private partnerships.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104908"},"PeriodicalIF":4.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An optimized code-free AI approach for efficient and accurate literature screening in bone organoid research 一种优化的无代码人工智能方法，用于骨类器官研究中高效准确的文献筛选。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-09 DOI: 10.1016/j.jbi.2025.104911

Jiaxu Zheng , Janak Lal Pathak , Shangyan Li , Anqi Li , Jiechun Fang , Zonghua Li , Zhisheng Bi , Yin Xiao , Qing Zhang

{"title":"An optimized code-free AI approach for efficient and accurate literature screening in bone organoid research","authors":"Jiaxu Zheng , Janak Lal Pathak , Shangyan Li , Anqi Li , Jiechun Fang , Zonghua Li , Zhisheng Bi , Yin Xiao , Qing Zhang","doi":"10.1016/j.jbi.2025.104911","DOIUrl":"10.1016/j.jbi.2025.104911","url":null,"abstract":"<div><div>The exponential growth of biomedical literature has rendered traditional screening methods inefficient and unsustainable, making knowledge discovery akin to finding a needle in a haystack. While recent advances in artificial intelligence (AI) offer new opportunities for rapid literature retrieval, many clinicians and researchers lack familiarity with these tools. In this study, we optimized LitSuggest, a user-friendly, code-free AI-based literature screening system, and established a standardized operational workflow. Using the field of organoid-based bone tissue engineering as a case study, the optimized system achieved an accuracy of 98.83%, precision of 76.19%, recall of 83.33%, and an F1-score of 79.60%, while reducing manual screening workload by over 90%. Furthermore, we innovatively integrated correlation scoring into literature analysis, revealing that China and the United States are leading contributors to bone organoid regeneration research, and that complex and genetic disease organoid models hold significant research potential. This AI-driven approach enables researchers to focus on high-value literature, improving efficiency while guiding future research in bone organoid regeneration and broader biomedical fields.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104911"},"PeriodicalIF":4.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PCGMMF: a prediction method for breast cancer prognostic recurrence and metastasis risk based on enhanced multimodal feature fusion PCGMMF：基于增强多模态特征融合的乳腺癌预后复发转移风险预测方法。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-09 DOI: 10.1016/j.jbi.2025.104907

Wei Du , Liang Gao , Xianhua Xu , Yuhua Yao , Zhong Li

{"title":"PCGMMF: a prediction method for breast cancer prognostic recurrence and metastasis risk based on enhanced multimodal feature fusion","authors":"Wei Du , Liang Gao , Xianhua Xu , Yuhua Yao , Zhong Li","doi":"10.1016/j.jbi.2025.104907","DOIUrl":"10.1016/j.jbi.2025.104907","url":null,"abstract":"<div><h3>Background</h3><div>Breast cancer is a highly heterogeneous disease with high morbidity and mortality rates. Despite the availability of various treatments, a significant number of patients still face a high probability of recurrence or metastasis, which severely impacts their survival status. Traditional prognostic methods based on single-modality data and machine learning algorithms often fail to adequately capture the complex biological relationships and heterogeneous characteristics of breast cancer, leading to suboptimal prognostic performance. Therefore, there is an urgent need for a more accurate and effective method to predict the risk of recurrence and metastasis in breast cancer prognosis.</div></div><div><h3>Methods</h3><div>In this study, we propose a novel method termed PCGMMF for breast cancer prognostic analysis. This method integrates histopathological images, clinical data, gene expression data, and DNA methylation data through multimodal fusion. We leverage a pre-trained Vision-LSTM model based on transfer learning to extract features from histopathological images. Additionally, we design a comprehensive feature selection strategy that includes support vector machine (SVM), Mantel test, and correlation analysis to filter features from gene expression data and DNA methylation data. Furthermore, to address the high heterogeneity of breast cancer and the independence and intersectionality of multimodal features, we propose a bidirectional attention and self-attention based enhanced multimodal feature fusion module called BSAMF.</div></div><div><h3>Results</h3><div>Through a series of experiments, we evaluate the performance of PCGMMF. When predicting the recurrence and metastasis risk of breast cancer prognosis, PCGMMF achieves an accuracy of 0.903 and an AUC value of 0.924, outperforming other state-of-the-art methods. Furthermore, we provide an interpretability analysis of highly significant regions from histopathological images, which can serve as a reference for clinical practice.</div></div><div><h3>Conclusion</h3><div>PCGMMF offers a robust and innovative solution for breast cancer prognostic analysis by effectively integrating multimodal data and utilizing advanced deep learning techniques. It can effectively conduct breast cancer prognostic analysis and provide significant references for personalized precision treatment and clinical practice.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104907"},"PeriodicalIF":4.5,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145040243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records 通过丰富的验证和有针对性的抽样来克服数据挑战，以测量电子健康记录中的整个人的健康。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-02 DOI: 10.1016/j.jbi.2025.104904

Sarah C. Lotspeich , Sheetal Kedar , Rabeya Tahir , Aidan D. Keleghan , Amelia Miranda , Stephany N. Duda , Michael P. Bancks , Brian J. Wells , Ashish K. Khanna , Joseph Rigdon

{"title":"Overcoming data challenges through enriched validation and targeted sampling to measure whole-person health in electronic health records","authors":"Sarah C. Lotspeich , Sheetal Kedar , Rabeya Tahir , Aidan D. Keleghan , Amelia Miranda , Stephany N. Duda , Michael P. Bancks , Brian J. Wells , Ashish K. Khanna , Joseph Rigdon","doi":"10.1016/j.jbi.2025.104904","DOIUrl":"10.1016/j.jbi.2025.104904","url":null,"abstract":"<div><h3>Objective:</h3><div>The allostatic load index (ALI) is a 10-component composite measure of whole-person health, which reflects the multiple interrelated physiological regulatory systems that underlie healthy functioning. Data from electronic health records (EHR) present a huge opportunity to operationalize the ALI in learning health systems; however, these data are prone to missingness and errors. Validation (e.g., through chart reviews) can provide better-quality data, but realistically, only a subset of patients’ data can be validated, and most protocols do not recover missing data.</div></div><div><h3>Methods:</h3><div>Using a representative sample of 1000 patients from the EHR at an extensive learning health system (100 of whom could be validated), we propose methods to design, conduct, and analyze statistically efficient and robust studies of ALI and healthcare utilization. Employing semiparametric maximum likelihood estimation, we robustly incorporate all available patient information into statistical models. Using targeted design strategies, we examine ways to select the most informative patients for validation. Incorporating clinical expertise, we devise a novel validation protocol to promote EHR data quality and completeness.</div></div><div><h3>Results:</h3><div>Chart reviews uncovered few errors (99% matched source documents) and recovered some missing data through auxiliary information in patients’ charts. On average, validation increased the number of non-missing ALI components per patient from 6 to 7. Through simulations based on preliminary data, residual sampling was identified as the most informative strategy for completing our validation study. Incorporating validation data, statistical models indicated that worse whole-person health (higher ALI) was associated with higher odds of engaging in the healthcare system, adjusting for age.</div></div><div><h3>Conclusion:</h3><div>Targeted validation with an enriched protocol can ensure the quality and promote the completeness of EHR data. Findings from our validation study were incorporated into analyses as we operationalize the ALI as a scalable whole-person health measure that predicts healthcare utilization in the learning health system.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"170 ","pages":"Article 104904"},"PeriodicalIF":4.5,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145000662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to “A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses”. [J. Biomed. Inform. 162 (2025) 104771] “协调NHS苏格兰实验室数据以实现国家级分析的管道”的勘误。[J。生物医学。通报。162(2025)104771]。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-01 DOI: 10.1016/j.jbi.2025.104891

Chuang Gao , Shahzad Mumtaz , Sophie McCall , Katherine O’Sullivan , Mark McGilchrist , Daniel R. Morales , Christopher Hall , Katie Wilde , Charlie Mayor , Pamela Linksted , Kathy Harrison , Christian Cole , Emily Jefferson

{"title":"Corrigendum to “A pipeline for harmonising NHS Scotland laboratory data to enable national-level analyses”. [J. Biomed. Inform. 162 (2025) 104771]","authors":"Chuang Gao , Shahzad Mumtaz , Sophie McCall , Katherine O’Sullivan , Mark McGilchrist , Daniel R. Morales , Christopher Hall , Katie Wilde , Charlie Mayor , Pamela Linksted , Kathy Harrison , Christian Cole , Emily Jefferson","doi":"10.1016/j.jbi.2025.104891","DOIUrl":"10.1016/j.jbi.2025.104891","url":null,"abstract":"<div><h3>Objective</h3><div>Medical laboratory data together with prescribing and hospitalisation records are three of the most used electronic health records (EHRs) for data-driven health research. In Scotland, hospitalisation, prescribing and the death register data are available nationally whereas laboratory data is captured, stored and reported from local health board systems with significant heterogeneity. For researchers or other users of this regionally curated data, working on laboratory datasets across regional cohorts requires effort and time. As part of this study, the Scottish Safe Haven Network have developed an open-source software pipeline to generate a harmonised laboratory dataset.</div></div><div><h3>Methods</h3><div>We obtained sample laboratory data from the four regional Safe Havens in Scotland covering people within the SHARE consented cohort. We compared the variables collected by each regional Safe Haven and mapped these to 11 FHIR and 2 Scottish-specific standardised terms (i.e., one to indicate the regional health board and a second to describe the source clinical code description)</div></div><div><h3>Results</h3><div>We compared the laboratory data and found that 182 test codes covered 98.7 % of test records performed across Scotland. Focusing on the 182 test codes, we developed a set of transformations to convert test results captured in different units to the same unit. We included both Read Codes and SNOMED CT to encode the tests within the pipeline.</div></div><div><h3>Conclusion</h3><div>We validated our harmonisation pipeline by comparing the results across the different regional datasets. The pipeline can be reused by researchers and/or Safe Havens to generate clean, harmonised laboratory data at a national level with minimal effort.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"169 ","pages":"Article 104891"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144835216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying task groupings for multi-task learning using pointwise V-usable information. 使用点v可用信息识别多任务学习的任务分组。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-01 Epub Date: 2025-07-16 DOI: 10.1016/j.jbi.2025.104881

Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova

{"title":"Identifying task groupings for multi-task learning using pointwise V-usable information.","authors":"Yingya Li, Timothy Miller, Steven Bethard, Guergana Savova","doi":"10.1016/j.jbi.2025.104881","DOIUrl":"10.1016/j.jbi.2025.104881","url":null,"abstract":"<p><strong>Objective: </strong>Even in the era of Large Language Models (LLMs) which are claimed to be solutions for many tasks, fine-tuning language models remains a core methodology used in deployment for a variety of reasons - computational efficiency and performance maximization among them. Fine-tuning could be single-task or multi-task joint learning where the tasks support each other thus boosting their performance. The success of multi-task learning can depend heavily on which tasks are grouped together. Naively grouping all tasks or a random set of tasks can result in negative transfer, with the multi-task models performing worse than single-task models. Though many efforts have been made to identify task groupings and to measure the relatedness among different tasks, it remains a challenging research topic to define a metric to identify the best task grouping out of a pool of many potential task combinations. We propose such a metric.</p><p><strong>Methods: </strong>We propose a metric of task relatedness based on task difficulty measured by pointwise V-usable information (PVI). PVI is a recently proposed metric to estimate how much usable information a dataset contains given a model. We hypothesize that tasks with not statistically different PVI estimates are similar enough to benefit from the joint learning process. We conduct comprehensive experiments to evaluate the feasibility of this metric for task grouping on 15 NLP datasets in the general, biomedical, and clinical domains. We compare the results of the joint learners against single learners, existing baseline methods, and recent large language models, including Llama and GPT-4.</p><p><strong>Results: </strong>The results show that by grouping tasks with similar PVI estimates, the joint learners yielded competitive results with fewer total parameters, with consistent performance across domains.</p><p><strong>Conclusion: </strong>For domain-specific tasks, finetuned models may remain a preferable option, and the PVI-based method of grouping tasks for multi-task learning could be particularly beneficial. This metric could be wrapped in the overall recipe of fine-tuning language models.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104881"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144667675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Digital twins in increasing diversity in clinical trials: A systematic review. 数字双胞胎在临床试验中增加多样性：系统回顾。

IF 4.5 2区医学

Journal of Biomedical Informatics Pub Date : 2025-09-01 Epub Date: 2025-08-08 DOI: 10.1016/j.jbi.2025.104879

Abigail Tubbs, Enrique Alvarez Vazquez

{"title":"Digital twins in increasing diversity in clinical trials: A systematic review.","authors":"Abigail Tubbs, Enrique Alvarez Vazquez","doi":"10.1016/j.jbi.2025.104879","DOIUrl":"10.1016/j.jbi.2025.104879","url":null,"abstract":"<p><p>The integration of digital twin (DT) technology and artificial intelligence (AI) into clinical trials holds transformative potential for addressing persistent inequities in participant representation. This systematic review evaluates the role of these technologies in improving diversity, particularly in racial, ethnic, gender, age, and socioeconomic dimensions, minimizing bias, and allowing personalized medicine in clinical research settings. Evidence from 90 studies reveals that digital twins offer dynamic simulation capabilities for trial design, while AI facilitates predictive analytics and recruitment optimization. However, implementation remains hindered by fragmented regulatory frameworks, biased datasets, and infrastructural disparities. Ethical concerns,including privacy, consent, and algorithmic opacity, further complicate the deployment. Inclusive data practices identified in the literature include the use of demographically representative training data, participatory data collection frameworks, and equity audits to detect and correct systemic bias. Fairness in AI and DT models is primarily operationalized through group fairness metrics such as demographic parity and equalized odds, along with fairness, aware model training and validation. Key gaps include the lack of global standards, underrepresentation in model training, and challenges in real-world adoption. To overcome these barriers, the review proposes actionable directions: developing inclusive data practices, harmonizing regulatory oversight, and embedding fairness into computational model design. By focusing on diversity as a design principle, AI and DT technologies can support a more equitable and generalizable future for clinical research.</p>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":" ","pages":"104879"},"PeriodicalIF":4.5,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144816690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0