Yiming Li , Qiang Wei , Xinghan Chen , Jianfu Li , Cui Tao , Hua Xu
{"title":"Improving tabular data extraction in scanned laboratory reports using deep learning models","authors":"Yiming Li , Qiang Wei , Xinghan Chen , Jianfu Li , Cui Tao , Hua Xu","doi":"10.1016/j.jbi.2024.104735","DOIUrl":"10.1016/j.jbi.2024.104735","url":null,"abstract":"<div><h3>Objective</h3><div>Medical laboratory testing is essential in healthcare, providing crucial data for diagnosis and treatment. Nevertheless, patients’ lab testing results are often transferred via fax across healthcare organizations and are not immediately available for timely clinical decision making. Thus, it is important to develop new technologies to accurately extract lab testing information from scanned laboratory reports. This study aims to develop an advanced deep learning-based Optical Character Recognition (OCR) method to identify tables containing lab testing results in scanned laboratory reports.</div></div><div><h3>Methods</h3><div>Extracting tabular data from scanned lab reports involves two stages: table detection (i.e., identifying the area of a table object) and table recognition (i.e., identifying and extracting tabular structures and contents). DETR R18 algorithm as well as YOLOv8s were involved for table detection, and we compared the performance of PaddleOCR and the encoder-dual-decoder (EDD) model for table recognition. 650 tables from 632 randomly selected laboratory test reports were annotated and used to train and evaluate those models. For table detection evaluation, we used metrics such as Average Precision (AP), Average Recall (AR), AP50, and AP75. For table recognition evaluation, we employed Tree-Edit Distance (TEDS).</div></div><div><h3>Results</h3><div>For table detection, fine-tuned DETR R18 demonstrated superior performance (AP50: 0.774; AP75: 0.644; AP: 0.601; AR: 0.766). In terms of table recognition, fine-tuned EDD outperformed other models with a TEDS score of 0.815. The proposed OCR pipeline (fine-tuned DETR R18 and fine-tuned EDD), demonstrated impressive results, achieving a TEDS score of 0.699 and a TEDS structure score of 0.764.</div></div><div><h3>Conclusions</h3><div>Our study presents a dedicated OCR pipeline for scanned clinical documents, utilizing state-of-the-art deep learning models for region-of-interest detection and table recognition. The high TEDS scores demonstrate the effectiveness of our approach, which has significant implications for clinical data analysis and decision-making.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104735"},"PeriodicalIF":4.0,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maciej Rybinski , Wojciech Kusa , Sarvnaz Karimi , Allan Hanbury
{"title":"Learning to match patients to clinical trials using large language models","authors":"Maciej Rybinski , Wojciech Kusa , Sarvnaz Karimi , Allan Hanbury","doi":"10.1016/j.jbi.2024.104734","DOIUrl":"10.1016/j.jbi.2024.104734","url":null,"abstract":"<div><h3>Objective:</h3><div>This study investigates the use of Large Language Models (LLMs) for matching patients to clinical trials (CTs) within an information retrieval pipeline. Our objective is to enhance the process of patient-trial matching by leveraging the semantic processing capabilities of LLMs, thereby improving the effectiveness of patient recruitment for clinical trials.</div></div><div><h3>Methods:</h3><div>We employed a multi-stage retrieval pipeline integrating various methodologies, including BM25 and Transformer-based rankers, along with LLM-based methods. Our primary datasets were the TREC Clinical Trials 2021–23 track collections. We compared LLM-based approaches, focusing on methods that leverage LLMs in query formulation, filtering, relevance ranking, and re-ranking of CTs.</div></div><div><h3>Results:</h3><div>Our results indicate that LLM-based systems, particularly those involving re-ranking with a fine-tuned LLM, outperform traditional methods in terms of nDCG and Precision measures. The study demonstrates that fine-tuning LLMs enhances their ability to find eligible trials. Moreover, our LLM-based approach is competitive with state-of-the-art systems in the TREC challenges.</div><div>The study shows the effectiveness of LLMs in CT matching, highlighting their potential in handling complex semantic analysis and improving patient-trial matching. However, the use of LLMs increases the computational cost and reduces efficiency. We provide a detailed analysis of effectiveness-efficiency trade-offs.</div></div><div><h3>Conclusion:</h3><div>This research demonstrates the promising role of LLMs in enhancing the patient-to-clinical trial matching process, offering a significant advancement in the automation of patient recruitment. Future work should explore optimising the balance between computational cost and retrieval effectiveness in practical applications.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104734"},"PeriodicalIF":4.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142400333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yu Yin , Hyunjae Kim , Xiao Xiao , Chih Hsuan Wei , Jaewoo Kang , Zhiyong Lu , Hua Xu , Meng Fang , Qingyu Chen
{"title":"Augmenting biomedical named entity recognition with general-domain resources","authors":"Yu Yin , Hyunjae Kim , Xiao Xiao , Chih Hsuan Wei , Jaewoo Kang , Zhiyong Lu , Hua Xu , Meng Fang , Qingyu Chen","doi":"10.1016/j.jbi.2024.104731","DOIUrl":"10.1016/j.jbi.2024.104731","url":null,"abstract":"<div><h3>Objective</h3><div>Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets.</div></div><div><h3>Methods</h3><div>We proposed GERBERA, a simple-yet-effective method that utilized general-domain NER datasets for training. We performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset.</div></div><div><h3>Results</h3><div>We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with additional BioNER datasets. Specifically, our models consistently outperformed the baseline models in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight entities. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset.</div></div><div><h3>Conclusion</h3><div>This study introduces a new training method that leverages cost-effective general-domain NER datasets to augment BioNER models. This approach significantly improves BioNER model performance, making it a valuable asset for scenarios with scarce or costly biomedical datasets. We make data, codes, and models publicly available via <span><span>https://github.com/qingyu-qc/bioner_gerbera</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104731"},"PeriodicalIF":4.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dulin Wang , Xiaotian Ma , Paul E. Schulz , Xiaoqian Jiang , Yejin Kim
{"title":"Clinical outcome-guided deep temporal clustering for disease progression subtyping","authors":"Dulin Wang , Xiaotian Ma , Paul E. Schulz , Xiaoqian Jiang , Yejin Kim","doi":"10.1016/j.jbi.2024.104732","DOIUrl":"10.1016/j.jbi.2024.104732","url":null,"abstract":"<div><h3>Objective</h3><div>Complex diseases exhibit heterogeneous progression patterns, necessitating effective capture and clustering of longitudinal changes to identify disease subtypes for personalized treatments. However, existing studies often fail to design clustering-specific representations or neglect clinical outcomes, thereby limiting the interpretability and clinical utility.</div></div><div><h3>Method</h3><div>We design a unified framework for subtyping longitudinal progressive diseases. We focus on effectively integrating all data from disease progressions and improving patient representation for downstream clustering. Specifically, we propose a clinical <strong>O</strong>utcome-<strong>G</strong>uided <strong>D</strong>eep <strong>T</strong>emporal <strong>C</strong>lustering (OG-DTC) that generates representations informed by clustering and clinical outcomes. A GRU-based seq2seq architecture captures the temporal dynamics, and the model integrates <em>k</em>-means clustering and outcome regression to facilitate the formation of clustering structures and the integration of clinical outcomes. The learned representations are clustered using a Gaussian mixture model to identify distinct subtypes. The clustering results are extensively validated through reproducibility, stability, and significance tests.</div></div><div><h3>Results</h3><div>We demonstrated the efficacy of our framework by applying it to three Alzheimer’s Disease (AD) clinical trials. Through the AD case study, we identified three distinct subtypes with unique patterns associated with differentiated clinical declines across multiple measures. The ablation study revealed the contributions of each component in the model and showed that jointly optimizing the full model improved patient representations for clustering. Extensive validations showed that the derived clustering is reproducible, stable, and significant.</div></div><div><h3>Conclusion</h3><div>Our temporal clustering framework can derive robust clustering applicable for subtyping longitudinal progressive diseases and has the potential to account for subtype variability in clinical outcomes.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104732"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142365373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongkang Xiao , Sinian Zhang , Huixue Zhou , Mingchen Li , Han Yang , Rui Zhang
{"title":"FuseLinker: Leveraging LLM’s pre-trained text embeddings and domain knowledge to enhance GNN-based link prediction on biomedical knowledge graphs","authors":"Yongkang Xiao , Sinian Zhang , Huixue Zhou , Mingchen Li , Han Yang , Rui Zhang","doi":"10.1016/j.jbi.2024.104730","DOIUrl":"10.1016/j.jbi.2024.104730","url":null,"abstract":"<div><h3>Objective</h3><div>To develop the FuseLinker, a novel link prediction framework for biomedical knowledge graphs (BKGs), which fully exploits the graph’s structural, textual and domain knowledge information. We evaluated the utility of FuseLinker in the graph-based drug repurposing task through detailed case studies.</div></div><div><h3>Methods</h3><div>FuseLinker leverages fused pre-trained text embedding and domain knowledge embedding to enhance the graph neural network (GNN)-based link prediction model tailored for BKGs. This framework includes three parts: a) obtain text embeddings for BKGs using embedding-visible large language models (LLMs), b) learn the representations of medical ontology as domain knowledge information by employing the Poincaré graph embedding method, and c) fuse these embeddings and further learn the graph structure representations of BKGs by applying a GNN-based link prediction model. We evaluated FuseLinker against traditional knowledge graph embedding models and a conventional GNN-based link prediction model across four public BKG datasets. Additionally, we examined the impact of using different embedding-visible LLMs on FuseLinker’s performance. Finally, we investigated FuseLinker’s ability to generate medical hypotheses through two drug repurposing case studies for Sorafenib and Parkinson’s disease.</div></div><div><h3>Results</h3><div>By comparing FuseLinker with baseline models on four BKGs, our method demonstrates superior performance. The Mean Reciprocal Rank (MRR) and Area Under receiver operating characteristic Curve (AUROC) for KEGG50k, Hetionet, SuppKG and ADInt are 0.969 and 0.987, 0.548 and 0.903, 0.739 and 0.928, and 0.831 and 0.890, respectively.</div></div><div><h3>Conclusion</h3><div>Our study demonstrates that FuseLinker is an effective novel link prediction framework that integrates multiple graph information and shows significant potential for practical applications in biomedical and clinical tasks. Source code and data are available at https://github.com/YKXia0/FuseLinker.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104730"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142347388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiaqi Sun , Chen Zhang , Linlin Xing , Longbo Zhang , Hongzhen Cai , Maozu Guo
{"title":"BAMRE: Joint extraction model of Chinese medical entities and relations based on Biaffine transformation with relation attention","authors":"Jiaqi Sun , Chen Zhang , Linlin Xing , Longbo Zhang , Hongzhen Cai , Maozu Guo","doi":"10.1016/j.jbi.2024.104733","DOIUrl":"10.1016/j.jbi.2024.104733","url":null,"abstract":"<div><div>Electronic Health Records (EHRs) contain various valuable medical entities and their relationships. Although the extraction of biomedical relationships has achieved good results in the mining of electronic health records and the construction of biomedical knowledge bases, there are still some problems. There may be implied complex associations between entities and relationships in overlapping triplets, and ignoring these interactions may lead to a decrease in the accuracy of entity extraction. To address this issue, a joint extraction model for medical entity relations based on a relation attention mechanism is proposed. The relation extraction module identifies candidate relationships within a sentence. The attention mechanism based on these relationships assigns weights to contextual words in the sentence that are associated with different relationships. Additionally, it extracts the subject and object entities. Under a specific relationship, entity vector representations are utilized to construct a global entity matching matrix based on Biaffine transformations. This matrix is designed to enhance the semantic dependencies and relational representations between entities, enabling triplet extraction. This allows the two subtasks of named entity recognition and relation extraction to be interrelated, fully utilizing contextual information within the sentence, and effectively addresses the issue of overlapping triplets.</div><div>Experimental observations from the CMeIE Chinese medical relation extraction dataset and the Baidu2019 Chinese dataset confirm that our approach yields the superior <span><math><mrow><mi>F</mi><mn>1</mn></mrow></math></span> score across all cutting-edge baselines. Moreover, it offers substantial performance improvements in intricate situations involving diverse overlapping patterns, multitudes of triplets, and cross-sentence triplets.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104733"},"PeriodicalIF":4.0,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaohong Li , Guoheng Huang , Lianglun Cheng , Guo Zhong , Weihuang Liu , Xuhang Chen , Muyan Cai
{"title":"Cross-domain visual prompting with spatial proximity knowledge distillation for histological image classification","authors":"Xiaohong Li , Guoheng Huang , Lianglun Cheng , Guo Zhong , Weihuang Liu , Xuhang Chen , Muyan Cai","doi":"10.1016/j.jbi.2024.104728","DOIUrl":"10.1016/j.jbi.2024.104728","url":null,"abstract":"<div><h3>Objective:</h3><div>Histological classification is a challenging task due to the diverse appearances, unpredictable variations, and blurry edges of histological tissues. Recently, many approaches based on large networks have achieved satisfactory performance. However, most of these methods rely heavily on substantial computational resources and large high-quality datasets, limiting their practical application. Knowledge Distillation (KD) offers a promising solution by enabling smaller networks to achieve performance comparable to that of larger networks. Nonetheless, KD is hindered by the problem of high-dimensional characteristics, which makes it difficult to capture tiny scattered features and often leads to the loss of edge feature relationships.</div></div><div><h3>Methods:</h3><div>A novel cross-domain visual prompting distillation approach is proposed, compelling the teacher network to facilitate the extraction of significant high-dimensional features into low-dimensional feature maps, thereby aiding the student network in achieving superior performance. Additionally, a dynamic learnable temperature module based on novel vector-based spatial proximity is introduced to further encourage the student to imitate the teacher.</div></div><div><h3>Results:</h3><div>Experiments conducted on widely accepted histological datasets, NCT-CRC-HE-100K and LC25000, demonstrate the effectiveness of the proposed method and validate its robustness on the popular dermoscopic dataset ISIC-2019. Compared to state-of-the-art knowledge distillation methods, the proposed method achieves better performance and greater robustness with optimal domain adaptation.</div></div><div><h3>Conclusion:</h3><div>A novel distillation architecture, termed VPSP, tailored for histological classification, is proposed. This architecture achieves superior performance with optimal domain adaptation, enhancing the clinical application of histological classification. The source code will be released at <span><span>https://github.com/xiaohongji/VPSP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104728"},"PeriodicalIF":4.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142288115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junrong Song , Zhiming Song , Yuanli Gong , Lichang Ge , Wenlu Lou
{"title":"Advancing cancer driver gene identification through an integrative network and pathway approach","authors":"Junrong Song , Zhiming Song , Yuanli Gong , Lichang Ge , Wenlu Lou","doi":"10.1016/j.jbi.2024.104729","DOIUrl":"10.1016/j.jbi.2024.104729","url":null,"abstract":"<div><h3>Objective</h3><div>Cancer is a complex genetic disease characterized by the accumulation of various mutations, with driver genes playing a crucial role in cancer initiation and progression. Distinguishing driver genes from passenger mutations is essential for understanding cancer biology and discovering therapeutic targets. However, the majority of existing methods ignore the mutational heterogeneity and commonalities among patients, which hinders the identification of driver genes more effectively.</div></div><div><h3>Methods</h3><div>This study introduces MCSdriver, a novel computational model that integrates network and pathway information to prioritize the identification of cancer driver genes. MCSdriver employs a bidirectional random walk algorithm to quantify the mutual exclusivity and functional relationships between mutated genes within patient cohorts. It calculates similarity scores based on a mutual exclusivity-weighted network and pathway coverage patterns, accounting for patient-specific heterogeneity and molecular profile similarity.</div></div><div><h3>Results</h3><div>This approach enhances the accuracy and quality of driver gene identification. MCSdriver demonstrates superior performance in identifying cancer driver genes across four cancer types from The Cancer Genome Atlas, showing a higher F-score, Recall and Precision compared to existing ranking list-based and module-based models.</div></div><div><h3>Conclusion</h3><div>The MCSdriver model not only outperforms other models in identifying known cancer driver genes but also effectively identifies novel driver genes involved in cancer-related biological processes. The model’s consideration of patient-specific heterogeneity and similarity in molecular profiles significantly enhances the accuracy and quality of driver gene identification. Validation through Gene Ontology enrichment analysis and literature mining further underscores its potential application value in personalized cancer therapy, offering a promising tool for advancing our understanding and treatment of cancer.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104729"},"PeriodicalIF":4.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142288114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maxim Kryukov , Kathleen P. Moriarty , Macarena Villamea , Ingrid O’Dwyer , Ohn Chow , Flavio Dormont , Ramon Hernandez , Ziv Bar-Joseph , Brandon Rufino
{"title":"Proxy endpoints — bridging clinical trials and real world data","authors":"Maxim Kryukov , Kathleen P. Moriarty , Macarena Villamea , Ingrid O’Dwyer , Ohn Chow , Flavio Dormont , Ramon Hernandez , Ziv Bar-Joseph , Brandon Rufino","doi":"10.1016/j.jbi.2024.104723","DOIUrl":"10.1016/j.jbi.2024.104723","url":null,"abstract":"<div><h3>Objective:</h3><p>Disease severity scores, or endpoints, are routinely measured during Randomized Controlled Trials (RCTs) to closely monitor the effect of treatment. In real-world clinical practice, although a larger set of patients is observed, the specific RCT endpoints are often not captured, which makes it hard to utilize real-world data (RWD) to evaluate drug efficacy in larger populations.</p></div><div><h3>Methods:</h3><p>To overcome this challenge, we developed an ensemble technique which learns proxy models of disease endpoints in RWD. Using a multi-stage learning framework applied to RCT data, we first identify features considered significant drivers of disease available within RWD. To create endpoint proxy models, we use Explainable Boosting Machines (EBMs) which allow for both end-user interpretability and modeling of non-linear relationships.</p></div><div><h3>Results:</h3><p>We demonstrate our approach on two diseases, rheumatoid arthritis (RA) and atopic dermatitis (AD). As we show, our combined feature selection and prediction method achieves good results for both disease areas, improving upon prior methods proposed for predictive disease severity scoring.</p></div><div><h3>Conclusion:</h3><p>Having disease severity over time for a patient is important to further disease understanding and management. Our results open the door to more use cases in the space of RA and AD such as treatment effect estimates or prognostic scoring on RWD. Our framework may be extended beyond RA and AD to other diseases where the severity score is not well measured in electronic health records.</p></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"158 ","pages":"Article 104723"},"PeriodicalIF":4.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1532046424001412/pdfft?md5=7711cb401e9e3526c4adf1c9e025c587&pid=1-s2.0-S1532046424001412-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142274254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}