Journal of Biomedical Informatics最新文献

筛选
英文 中文
Improving tabular data extraction in scanned laboratory reports using deep learning models 利用深度学习模型改进扫描实验室报告中的表格数据提取。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-10 DOI: 10.1016/j.jbi.2024.104735
Yiming Li , Qiang Wei , Xinghan Chen , Jianfu Li , Cui Tao , Hua Xu
{"title":"Improving tabular data extraction in scanned laboratory reports using deep learning models","authors":"Yiming Li ,&nbsp;Qiang Wei ,&nbsp;Xinghan Chen ,&nbsp;Jianfu Li ,&nbsp;Cui Tao ,&nbsp;Hua Xu","doi":"10.1016/j.jbi.2024.104735","DOIUrl":"10.1016/j.jbi.2024.104735","url":null,"abstract":"<div><h3>Objective</h3><div>Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients’ lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports. This study aims to develop an advanced deep learning-based Optical Character Recognition (OCR) method to identify tables containing lab testing results in scanned laboratory reports.</div></div><div><h3>Methods</h3><div>Extracting tabular data from scanned lab reports involves two stages: table detection (i.e., identifying the area of a table object) and table recognition (i.e., identifying and extracting tabular structures and contents). DETR R18 algorithm as well as YOLOv8s were involved for table detection, and we compared the performance of PaddleOCR and the encoder-dual-decoder (EDD) model for table recognition. 650 tables from 632 randomly selected laboratory test reports were annotated and used to train and evaluate those models. For table detection evaluation, we used metrics such as Average Precision (AP), Average Recall (AR), AP50, and AP75. For table recognition evaluation, we employed Tree-Edit Distance (TEDS).</div></div><div><h3>Results</h3><div>For table detection, fine-tuned DETR R18 demonstrated superior performance (AP50: 0.774; AP75: 0.644; AP: 0.601; AR: 0.766). In terms of table recognition, fine-tuned EDD outperformed other models with a TEDS score of 0.815. The proposed OCR pipeline (fine-tuned DETR R18 and fine-tuned EDD), demonstrated impressive results, achieving a TEDS score of 0.699 and a TEDS structure score of 0.764.</div></div><div><h3>Conclusions</h3><div>Our study presents a dedicated OCR pipeline for scanned clinical documents, utilizing state-of-the-art deep learning models for region-of-interest detection and table recognition. The high TEDS scores demonstrate the effectiveness of our approach, which has significant implications for clinical data analysis and decision-making.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104735"},"PeriodicalIF":4.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to match patients to clinical trials using large language models 利用大型语言模型学习将患者与临床试验相匹配。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-09 DOI: 10.1016/j.jbi.2024.104734
Maciej Rybinski , Wojciech Kusa , Sarvnaz Karimi , Allan Hanbury
{"title":"Learning to match patients to clinical trials using large language models","authors":"Maciej Rybinski ,&nbsp;Wojciech Kusa ,&nbsp;Sarvnaz Karimi ,&nbsp;Allan Hanbury","doi":"10.1016/j.jbi.2024.104734","DOIUrl":"10.1016/j.jbi.2024.104734","url":null,"abstract":"<div><h3>Objective:</h3><div>This study investigates the use of Large Language Models (LLMs) for matching patients to clinical trials (CTs) within an information retrieval pipeline. Our objective is to enhance the process of patient-trial matching by leveraging the semantic processing capabilities of LLMs, thereby improving the effectiveness of patient recruitment for clinical trials.</div></div><div><h3>Methods:</h3><div>We employed a multi-stage retrieval pipeline integrating various methodologies, including BM25 and Transformer-based rankers, along with LLM-based methods. Our primary datasets were the TREC Clinical Trials 2021–23 track collections. We compared LLM-based approaches, focusing on methods that leverage LLMs in query formulation, filtering, relevance ranking, and re-ranking of CTs.</div></div><div><h3>Results:</h3><div>Our results indicate that LLM-based systems, particularly those involving re-ranking with a fine-tuned LLM, outperform traditional methods in terms of nDCG and Precision measures. The study demonstrates that fine-tuning LLMs enhances their ability to find eligible trials. Moreover, our LLM-based approach is competitive with state-of-the-art systems in the TREC challenges.</div><div>The study shows the effectiveness of LLMs in CT matching, highlighting their potential in handling complex semantic analysis and improving patient-trial matching. However, the use of LLMs increases the computational cost and reduces efficiency. We provide a detailed analysis of effectiveness-efficiency trade-offs.</div></div><div><h3>Conclusion:</h3><div>This research demonstrates the promising role of LLMs in enhancing the patient-to-clinical trial matching process, offering a significant advancement in the automation of patient recruitment. Future work should explore optimising the balance between computational cost and retrieval effectiveness in practical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104734"},"PeriodicalIF":4.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmenting biomedical named entity recognition with general-domain resources 利用通用领域资源增强生物医学命名实体识别。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-04 DOI: 10.1016/j.jbi.2024.104731
Yu Yin , Hyunjae Kim , Xiao Xiao , Chih Hsuan Wei , Jaewoo Kang , Zhiyong Lu , Hua Xu , Meng Fang , Qingyu Chen
{"title":"Augmenting biomedical named entity recognition with general-domain resources","authors":"Yu Yin ,&nbsp;Hyunjae Kim ,&nbsp;Xiao Xiao ,&nbsp;Chih Hsuan Wei ,&nbsp;Jaewoo Kang ,&nbsp;Zhiyong Lu ,&nbsp;Hua Xu ,&nbsp;Meng Fang ,&nbsp;Qingyu Chen","doi":"10.1016/j.jbi.2024.104731","DOIUrl":"10.1016/j.jbi.2024.104731","url":null,"abstract":"<div><h3>Objective</h3><div>Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets.</div></div><div><h3>Methods</h3><div>We proposed GERBERA, a simple-yet-effective method that utilized general-domain NER datasets for training. We performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset.</div></div><div><h3>Results</h3><div>We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with additional BioNER datasets. Specifically, our models consistently outperformed the baseline models in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight entities. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset.</div></div><div><h3>Conclusion</h3><div>This study introduces a new training method that leverages cost-effective general-domain NER datasets to augment BioNER models. This approach significantly improves BioNER model performance, making it a valuable asset for scenarios with scarce or costly biomedical datasets. We make data, codes, and models publicly available via <span><span>https://github.com/qingyu-qc/bioner_gerbera</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104731"},"PeriodicalIF":4.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clinical outcome-guided deep temporal clustering for disease progression subtyping 临床结果指导下的疾病进展亚型深度时间聚类。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-01 DOI: 10.1016/j.jbi.2024.104732
Dulin Wang , Xiaotian Ma , Paul E. Schulz , Xiaoqian Jiang , Yejin Kim
{"title":"Clinical outcome-guided deep temporal clustering for disease progression subtyping","authors":"Dulin Wang ,&nbsp;Xiaotian Ma ,&nbsp;Paul E. Schulz ,&nbsp;Xiaoqian Jiang ,&nbsp;Yejin Kim","doi":"10.1016/j.jbi.2024.104732","DOIUrl":"10.1016/j.jbi.2024.104732","url":null,"abstract":"<div><h3>Objective</h3><div>Complex diseases exhibit heterogeneous progression patterns, necessitating effective capture and clustering of longitudinal changes to identify disease subtypes for personalized treatments. However, existing studies often fail to design clustering-specific representations or neglect clinical outcomes, thereby limiting the interpretability and clinical utility.</div></div><div><h3>Method</h3><div>We design a unified framework for subtyping longitudinal progressive diseases. We focus on effectively integrating all data from disease progressions and improving patient representation for downstream clustering. Specifically, we propose a clinical <strong>O</strong>utcome-<strong>G</strong>uided <strong>D</strong>eep <strong>T</strong>emporal <strong>C</strong>lustering (OG-DTC) that generates representations informed by clustering and clinical outcomes. A GRU-based seq2seq architecture captures the temporal dynamics, and the model integrates <em>k</em>-means clustering and outcome regression to facilitate the formation of clustering structures and the integration of clinical outcomes. The learned representations are clustered using a Gaussian mixture model to identify distinct subtypes. The clustering results are extensively validated through reproducibility, stability, and significance tests.</div></div><div><h3>Results</h3><div>We demonstrated the efficacy of our framework by applying it to three Alzheimer’s Disease (AD) clinical trials. Through the AD case study, we identified three distinct subtypes with unique patterns associated with differentiated clinical declines across multiple measures. The ablation study revealed the contributions of each component in the model and showed that jointly optimizing the full model improved patient representations for clustering. Extensive validations showed that the derived clustering is reproducible, stable, and significant.</div></div><div><h3>Conclusion</h3><div>Our temporal clustering framework can derive robust clustering applicable for subtyping longitudinal progressive diseases and has the potential to account for subtype variability in clinical outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104732"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142365373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FuseLinker: Leveraging LLM’s pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs FuseLinker:利用 LLM 的预训练文本嵌入和领域知识,增强基于 GNN 的生物医学知识图谱链接预测。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-01 DOI: 10.1016/j.jbi.2024.104730
Yongkang Xiao , Sinian Zhang , Huixue Zhou , Mingchen Li , Han Yang , Rui Zhang
{"title":"FuseLinker: Leveraging LLM’s pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs","authors":"Yongkang Xiao ,&nbsp;Sinian Zhang ,&nbsp;Huixue Zhou ,&nbsp;Mingchen Li ,&nbsp;Han Yang ,&nbsp;Rui Zhang","doi":"10.1016/j.jbi.2024.104730","DOIUrl":"10.1016/j.jbi.2024.104730","url":null,"abstract":"<div><h3>Objective</h3><div>To develop the FuseLinker, a novel link prediction framework for biomedical knowledge graphs (BKGs), which fully exploits the graph’s structural, textual and domain knowledge information. We evaluated the utility of FuseLinker in the graph-based drug repurposing task through detailed case studies.</div></div><div><h3>Methods</h3><div>FuseLinker leverages fused pre-trained text embedding and domain knowledge embedding to enhance the graph neural network (GNN)-based link prediction model tailored for BKGs. This framework includes three parts: a) obtain text embeddings for BKGs using embedding-visible large language models (LLMs), b) learn the representations of medical ontology as domain knowledge information by employing the Poincaré graph embedding method, and c) fuse these embeddings and further learn the graph structure representations of BKGs by applying a GNN-based link prediction model. We evaluated FuseLinker against traditional knowledge graph embedding models and a conventional GNN-based link prediction model across four public BKG datasets. Additionally, we examined the impact of using different embedding-visible LLMs on FuseLinker’s performance. Finally, we investigated FuseLinker’s ability to generate medical hypotheses through two drug repurposing case studies for Sorafenib and Parkinson’s disease.</div></div><div><h3>Results</h3><div>By comparing FuseLinker with baseline models on four BKGs, our method demonstrates superior performance. The Mean Reciprocal Rank (MRR) and Area Under receiver operating characteristic Curve (AUROC) for KEGG50k, Hetionet, SuppKG and ADInt are 0.969 and 0.987, 0.548 and 0.903, 0.739 and 0.928, and 0.831 and 0.890, respectively.</div></div><div><h3>Conclusion</h3><div>Our study demonstrates that FuseLinker is an effective novel link prediction framework that integrates multiple graph information and shows significant potential for practical applications in biomedical and clinical tasks. Source code and data are available at https://github.com/YKXia0/FuseLinker.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104730"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BAMRE: Joint extraction model of Chinese medical entities and relations based on Biaffine transformation with relation attention BAMRE:基于 Biaffine 变换和关系关注的中医实体和关系联合提取模型。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-01 DOI: 10.1016/j.jbi.2024.104733
Jiaqi Sun , Chen Zhang , Linlin Xing , Longbo Zhang , Hongzhen Cai , Maozu Guo
{"title":"BAMRE: Joint extraction model of Chinese medical entities and relations based on Biaffine transformation with relation attention","authors":"Jiaqi Sun ,&nbsp;Chen Zhang ,&nbsp;Linlin Xing ,&nbsp;Longbo Zhang ,&nbsp;Hongzhen Cai ,&nbsp;Maozu Guo","doi":"10.1016/j.jbi.2024.104733","DOIUrl":"10.1016/j.jbi.2024.104733","url":null,"abstract":"<div><div>Electronic Health Records (EHRs) contain various valuable medical entities and their relationships. Although the extraction of biomedical relationships has achieved good results in the mining of electronic health records and the construction of biomedical knowledge bases, there are still some problems. There may be implied complex associations between entities and relationships in overlapping triplets, and ignoring these interactions may lead to a decrease in the accuracy of entity extraction. To address this issue, a joint extraction model for medical entity relations based on a relation attention mechanism is proposed. The relation extraction module identifies candidate relationships within a sentence. The attention mechanism based on these relationships assigns weights to contextual words in the sentence that are associated with different relationships. Additionally, it extracts the subject and object entities. Under a specific relationship, entity vector representations are utilized to construct a global entity matching matrix based on Biaffine transformations. This matrix is designed to enhance the semantic dependencies and relational representations between entities, enabling triplet extraction. This allows the two subtasks of named entity recognition and relation extraction to be interrelated, fully utilizing contextual information within the sentence, and effectively addresses the issue of overlapping triplets.</div><div>Experimental observations from the CMeIE Chinese medical relation extraction dataset and the Baidu2019 Chinese dataset confirm that our approach yields the superior <span><math><mrow><mi>F</mi><mn>1</mn></mrow></math></span> score across all cutting-edge baselines. Moreover, it offers substantial performance improvements in intricate situations involving diverse overlapping patterns, multitudes of triplets, and cross-sentence triplets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104733"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fairness and inclusion methods for biomedical informatics research 生物医学信息学研究的公平性和包容性方法。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-10-01 DOI: 10.1016/j.jbi.2024.104713
Shyam Visweswaran, Yuan Luo, Mor Peleg
{"title":"Fairness and inclusion methods for biomedical informatics research","authors":"Shyam Visweswaran,&nbsp;Yuan Luo,&nbsp;Mor Peleg","doi":"10.1016/j.jbi.2024.104713","DOIUrl":"10.1016/j.jbi.2024.104713","url":null,"abstract":"","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104713"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142072868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-domain visual prompting with spatial proximity knowledge distillation for histological image classification 利用空间邻近性知识提炼跨域视觉提示,实现组织学图像分类。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-09-21 DOI: 10.1016/j.jbi.2024.104728
Xiaohong Li , Guoheng Huang , Lianglun Cheng , Guo Zhong , Weihuang Liu , Xuhang Chen , Muyan Cai
{"title":"Cross-domain visual prompting with spatial proximity knowledge distillation for histological image classification","authors":"Xiaohong Li ,&nbsp;Guoheng Huang ,&nbsp;Lianglun Cheng ,&nbsp;Guo Zhong ,&nbsp;Weihuang Liu ,&nbsp;Xuhang Chen ,&nbsp;Muyan Cai","doi":"10.1016/j.jbi.2024.104728","DOIUrl":"10.1016/j.jbi.2024.104728","url":null,"abstract":"<div><h3>Objective:</h3><div>Histological classification is a challenging task due to the diverse appearances, unpredictable variations, and blurry edges of histological tissues. Recently, many approaches based on large networks have achieved satisfactory performance. However, most of these methods rely heavily on substantial computational resources and large high-quality datasets, limiting their practical application. Knowledge Distillation (KD) offers a promising solution by enabling smaller networks to achieve performance comparable to that of larger networks. Nonetheless, KD is hindered by the problem of high-dimensional characteristics, which makes it difficult to capture tiny scattered features and often leads to the loss of edge feature relationships.</div></div><div><h3>Methods:</h3><div>A novel cross-domain visual prompting distillation approach is proposed, compelling the teacher network to facilitate the extraction of significant high-dimensional features into low-dimensional feature maps, thereby aiding the student network in achieving superior performance. Additionally, a dynamic learnable temperature module based on novel vector-based spatial proximity is introduced to further encourage the student to imitate the teacher.</div></div><div><h3>Results:</h3><div>Experiments conducted on widely accepted histological datasets, NCT-CRC-HE-100K and LC25000, demonstrate the effectiveness of the proposed method and validate its robustness on the popular dermoscopic dataset ISIC-2019. Compared to state-of-the-art knowledge distillation methods, the proposed method achieves better performance and greater robustness with optimal domain adaptation.</div></div><div><h3>Conclusion:</h3><div>A novel distillation architecture, termed VPSP, tailored for histological classification, is proposed. This architecture achieves superior performance with optimal domain adaptation, enhancing the clinical application of histological classification. The source code will be released at <span><span>https://github.com/xiaohongji/VPSP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104728"},"PeriodicalIF":4.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142288115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing cancer driver gene identification through an integrative network and pathway approach 通过综合网络和通路方法推进癌症驱动基因的识别。
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-09-19 DOI: 10.1016/j.jbi.2024.104729
Junrong Song , Zhiming Song , Yuanli Gong , Lichang Ge , Wenlu Lou
{"title":"Advancing cancer driver gene identification through an integrative network and pathway approach","authors":"Junrong Song ,&nbsp;Zhiming Song ,&nbsp;Yuanli Gong ,&nbsp;Lichang Ge ,&nbsp;Wenlu Lou","doi":"10.1016/j.jbi.2024.104729","DOIUrl":"10.1016/j.jbi.2024.104729","url":null,"abstract":"<div><h3>Objective</h3><div>Cancer is a complex genetic disease characterized by the accumulation of various mutations, with driver genes playing a crucial role in cancer initiation and progression. Distinguishing driver genes from passenger mutations is essential for understanding cancer biology and discovering therapeutic targets. However, the majority of existing methods ignore the mutational heterogeneity and commonalities among patients, which hinders the identification of driver genes more effectively.</div></div><div><h3>Methods</h3><div>This study introduces MCSdriver, a novel computational model that integrates network and pathway information to prioritize the identification of cancer driver genes. MCSdriver employs a bidirectional random walk algorithm to quantify the mutual exclusivity and functional relationships between mutated genes within patient cohorts. It calculates similarity scores based on a mutual exclusivity-weighted network and pathway coverage patterns, accounting for patient-specific heterogeneity and molecular profile similarity.</div></div><div><h3>Results</h3><div>This approach enhances the accuracy and quality of driver gene identification. MCSdriver demonstrates superior performance in identifying cancer driver genes across four cancer types from The Cancer Genome Atlas, showing a higher F-score, Recall and Precision compared to existing ranking list-based and module-based models.</div></div><div><h3>Conclusion</h3><div>The MCSdriver model not only outperforms other models in identifying known cancer driver genes but also effectively identifies novel driver genes involved in cancer-related biological processes. The model’s consideration of patient-specific heterogeneity and similarity in molecular profiles significantly enhances the accuracy and quality of driver gene identification. Validation through Gene Ontology enrichment analysis and literature mining further underscores its potential application value in personalized cancer therapy, offering a promising tool for advancing our understanding and treatment of cancer.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104729"},"PeriodicalIF":4.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142288114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proxy endpoints — bridging clinical trials and real world data 代理终点--连接临床试验与真实世界的数据
IF 4 2区 医学
Journal of Biomedical Informatics Pub Date : 2024-09-17 DOI: 10.1016/j.jbi.2024.104723
Maxim Kryukov , Kathleen P. Moriarty , Macarena Villamea , Ingrid O’Dwyer , Ohn Chow , Flavio Dormont , Ramon Hernandez , Ziv Bar-Joseph , Brandon Rufino
{"title":"Proxy endpoints — bridging clinical trials and real world data","authors":"Maxim Kryukov ,&nbsp;Kathleen P. Moriarty ,&nbsp;Macarena Villamea ,&nbsp;Ingrid O’Dwyer ,&nbsp;Ohn Chow ,&nbsp;Flavio Dormont ,&nbsp;Ramon Hernandez ,&nbsp;Ziv Bar-Joseph ,&nbsp;Brandon Rufino","doi":"10.1016/j.jbi.2024.104723","DOIUrl":"10.1016/j.jbi.2024.104723","url":null,"abstract":"<div><h3>Objective:</h3><p>Disease severity scores, or endpoints, are routinely measured during Randomized Controlled Trials (RCTs) to closely monitor the effect of treatment. In real-world clinical practice, although a larger set of patients is observed, the specific RCT endpoints are often not captured, which makes it hard to utilize real-world data (RWD) to evaluate drug efficacy in larger populations.</p></div><div><h3>Methods:</h3><p>To overcome this challenge, we developed an ensemble technique which learns proxy models of disease endpoints in RWD. Using a multi-stage learning framework applied to RCT data, we first identify features considered significant drivers of disease available within RWD. To create endpoint proxy models, we use Explainable Boosting Machines (EBMs) which allow for both end-user interpretability and modeling of non-linear relationships.</p></div><div><h3>Results:</h3><p>We demonstrate our approach on two diseases, rheumatoid arthritis (RA) and atopic dermatitis (AD). As we show, our combined feature selection and prediction method achieves good results for both disease areas, improving upon prior methods proposed for predictive disease severity scoring.</p></div><div><h3>Conclusion:</h3><p>Having disease severity over time for a patient is important to further disease understanding and management. Our results open the door to more use cases in the space of RA and AD such as treatment effect estimates or prognostic scoring on RWD. Our framework may be extended beyond RA and AD to other diseases where the severity score is not well measured in electronic health records.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104723"},"PeriodicalIF":4.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424001412/pdfft?md5=7711cb401e9e3526c4adf1c9e025c587&pid=1-s2.0-S1532046424001412-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信