Xiang Dai , Sarvnaz Karimi , Abeed Sarker , Ben Hachey , Cecile Paris
{"title":"MultiADE: A Multi-domain benchmark for Adverse Drug Event extraction","authors":"Xiang Dai , Sarvnaz Karimi , Abeed Sarker , Ben Hachey , Cecile Paris","doi":"10.1016/j.jbi.2024.104744","DOIUrl":"10.1016/j.jbi.2024.104744","url":null,"abstract":"<div><h3>Objective:</h3><div>Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over the years, many datasets have been created, and shared tasks have been organised to facilitate active adverse event surveillance. However, most – if not all – datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation – the ability of a machine learning model to perform well on new, unseen domains (text types) – is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that is effective on various <em>types of text</em>, such as scientific literature and social media posts.</div></div><div><h3>Methods:</h3><div>We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named <span>MultiADE</span>. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset—<span>CADECv2</span>, which is an extension of <span>CADEC</span> (Karimi et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines.</div></div><div><h3>Conclusion:</h3><div>Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances.</div><div>The newly created <span>CADECv2</span> and the scripts for building the benchmark are publicly available at CSIRO’s Data Portal (<span><span>https://data.csiro.au/collection/csiro:62387</span><svg><path></path></svg></span>). These resources enable the research community to further information extraction, leading to more effective active adverse drug event surveillance.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"160 ","pages":"Article 104744"},"PeriodicalIF":4.0,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142621233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
William W. Stead , Adam Lewis , Nunzia B. Giuse , Annette M. Williams , Italo Biaggioni , Lisa Bastarache
{"title":"Disentangling the phenotypic patterns of hypertension and chronic hypotension","authors":"William W. Stead , Adam Lewis , Nunzia B. Giuse , Annette M. Williams , Italo Biaggioni , Lisa Bastarache","doi":"10.1016/j.jbi.2024.104743","DOIUrl":"10.1016/j.jbi.2024.104743","url":null,"abstract":"<div><h3>Objective</h3><div>2017 blood pressure (BP) categories focus on cardiac risk. We hypothesize that studying the balance between mechanisms that increase or decrease BP across the medical phenome will lead to new insights. We devised a classifier that uses BP measures to assign individuals to mutually exclusive categories centered in the upper (Htn), lower (Hotn) and middle (Naf) zones of the BP spectrum; and examined the epidemiologic and phenotypic patterns of these BP-categories.</div></div><div><h3>Methods</h3><div>We classified a cohort of 832,560 deidentified electronic health records by BP-category; compared the frequency of BP-categories and four subtypes of Htn and Hotn by sex and age-decade; visualized the distributions of systolic, diastolic, mean arterial and pulse pressures stratified by BP-category; and ran Phenome-wide Association Studies (PheWAS) for Htn and Hotn. We paired knowledgebases for hypertension and hypotension and computed aggregate knowledgebase status (KB-status) indicating known associations. We assessed alignment of PheWAS results with KB-status for phecodes in the knowledgebase, and paired PheWAS correlations with KB-status to surface phenotypic patterns.</div></div><div><h3>Results</h3><div>BP-categories represent distinct distributions within the multimodal distributions of systolic and diastolic pressure. They are centered in the upper, lower, and middle zones of mean arterial pressure and provide a different signal than pulse pressure. For phecodes in the knowledgebase, 85% of positive correlations align with KB-status. Phenotypic patterns for Htn and Hotn overlap for several phecodes and are separate for others. Our analysis suggests five candidates for hypothesis testing research, two where the prevalence of the association with Htn or Hotn may be under appreciated, three where mechanisms that increase and decrease blood pressure may be affecting one another’s expression.</div></div><div><h3>Conclusion</h3><div>PairedPheWAS methods may open a phenome-wide path to disentangling hypertension and chronic hypotension. Our classifier provides a starting point for assigning individuals to BP-categories representing the upper, lower, and middle zones of the BP spectrum. 4.7 % of individuals matching 2017 BP categories for normal, elevated BP or isolated hypertension, have diastolic pressure < 60. Research is needed to fine-tune the classifier, provide external validation, evaluate the clinical significance of diastolic pressure < 60, and test the candidate hypotheses.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104743"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142564529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demonstration-based learning for few-shot biomedical named entity recognition under machine reading comprehension","authors":"Leilei Su , Jian Chen , Yifan Peng , Cong Sun","doi":"10.1016/j.jbi.2024.104739","DOIUrl":"10.1016/j.jbi.2024.104739","url":null,"abstract":"<div><h3>Objective:</h3><div>Although deep learning techniques have shown significant achievements, they frequently depend on extensive amounts of hand-labeled data and tend to perform inadequately in few-shot scenarios. The objective of this study is to devise a strategy that can improve the model’s capability to recognize biomedical entities in scenarios of few-shot learning.</div></div><div><h3>Methods:</h3><div>By redefining biomedical named entity recognition (BioNER) as a machine reading comprehension (MRC) problem, we propose a demonstration-based learning method to address few-shot BioNER, which involves constructing appropriate task demonstrations. In assessing our proposed method, we compared the proposed method with existing advanced methods using six benchmark datasets, including BC4CHEMD, BC5CDR-Chemical, BC5CDR-Disease, NCBI-Disease, BC2GM, and JNLPBA.</div></div><div><h3>Results:</h3><div>We examined the models’ efficacy by reporting F1 scores from both the 25-shot and 50-shot learning experiments. In 25-shot learning, we observed 1.1% improvements in the average F1 scores compared to the baseline method, reaching 61.7%, 84.1%, 69.1%, 70.1%, 50.6%, and 59.9% on six datasets, respectively. In 50-shot learning, we further improved the average F1 scores by 1.0% compared to the baseline method, reaching 73.1%, 86.8%, 76.1%, 75.6%, 61.7%, and 65.4%, respectively.</div></div><div><h3>Conclusion:</h3><div>We reported that in the realm of few-shot learning BioNER, MRC-based language models are much more proficient in recognizing biomedical entities compared to the sequence labeling approach. Furthermore, our MRC-language models can compete successfully with fully-supervised learning methodologies that rely heavily on the availability of abundant annotated data. These results highlight possible pathways for future advancements in few-shot BioNER methodologies.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104739"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tinglin Huang , Syed Asad Rizvi , Rohan Krishna Thakur , Vimig Socrates , Meili Gupta , David van Dijk , R. Andrew Taylor , Rex Ying
{"title":"HEART: Learning better representation of EHR data with a heterogeneous relation-aware transformer","authors":"Tinglin Huang , Syed Asad Rizvi , Rohan Krishna Thakur , Vimig Socrates , Meili Gupta , David van Dijk , R. Andrew Taylor , Rex Ying","doi":"10.1016/j.jbi.2024.104741","DOIUrl":"10.1016/j.jbi.2024.104741","url":null,"abstract":"<div><h3>Objective:</h3><div>Pretrained language models have recently demonstrated their effectiveness in modeling Electronic Health Record (EHR) data by modeling the encounters of patients as sentences. However, existing methods fall short of utilizing the inherent heterogeneous correlations between medical entities—which include diagnoses, medications, procedures, and lab tests. Existing studies either focus merely on diagnosis entities or encode different entities in a homogeneous space, leading to suboptimal performance. Motivated by this, we aim to develop a foundational language model pre-trained on EHR data with explicitly incorporating the heterogeneous correlations among these entities.</div></div><div><h3>Methods:</h3><div>In this study, we propose <span>HEART</span>, a heterogeneous relation-aware transformer for EHR. Our model includes a range of heterogeneous entities within each input sequence and represents pairwise relationships between entities as a relation embedding. Such a higher-order representation allows the model to perform complex reasoning and derive attention weights in the heterogeneous context. Additionally, a multi-level attention scheme is employed to exploit the connection between different encounters while alleviating the high computational costs. For pretraining, <span>HEART</span> engages with two tasks, missing entity prediction and anomaly detection, which both effectively enhance the model’s performance on various downstream tasks.</div></div><div><h3>Results:</h3><div>Extensive experiments on two EHR datasets and five downstream tasks demonstrate <span>HEART</span>’s superior performance compared to four SOTA foundation models. For instance, <span>HEART</span> achieves improvements of 12.1% and 4.1% over Med-BERT in death and readmission prediction, respectively. Additionally, case studies show that <span>HEART</span> offers interpretable insights into the relationships between entities through the learned relation embeddings.</div></div><div><h3>Conclusion:</h3><div>We study the problem of EHR representation learning and propose HEART, a model that leverages the heterogeneous relationships between medical entities. Our approach includes a multi-level encoding scheme and two specialized pretrained objectives, designed to boost both the efficiency and effectiveness of the model. We have comprehensively evaluated HEART across five clinically significant downstream tasks using two EHR datasets. The experimental results verify the model’s great performance and validate its practical utility in healthcare applications. Code: <span><span>https://github.com/Graph-and-Geometric-Learning/HEART</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104741"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcin Radom , Agnieszka Rybarczyk , Igor Piekarz , Piotr Formanowicz
{"title":"Algorithms for evaluation of minimal cut sets","authors":"Marcin Radom , Agnieszka Rybarczyk , Igor Piekarz , Piotr Formanowicz","doi":"10.1016/j.jbi.2024.104740","DOIUrl":"10.1016/j.jbi.2024.104740","url":null,"abstract":"<div><h3>Objective:</h3><div>We propose a way to enhance the evaluation of minimal cut sets (MCSs) in biological systems modeled by Petri nets, by providing criteria and methodology for determining their optimality in disabling specific processes without affecting critical system components.</div></div><div><h3>Methods:</h3><div>This study concerns Petri nets to model biological systems and utilizes two primary approaches for MCS evaluation. First is the analyzing impact on t-invariants to identify structural dependencies. Second is assessing the impact on potentially starved transitions caused by the inactivity of specific MCSs. This approach deal with net dynamics. These methodologies aim to offer practical tools for assessing the quality and effectiveness of MCSs.</div></div><div><h3>Results:</h3><div>The proposed methodologies were applied to two case studies. In the first case, a cholesterol metabolism network was analyzed to investigate how local inflammation and oxidative stress, in conjunction with cholesterol imbalances, influence the progression of atherosclerosis. The MCSs were ranked, with the top sets presented, focusing on those that disabled the fewest number of t-invariants. In the second case, a carbohydrate metabolism disorder model was examined to understand its impact on atherosclerosis progression. The analysis aimed to identify MCSs that could inhibit the atherosclerosis process by targeting specific transitions. Both studies utilized the Holmes software for calculations, demonstrating the effectiveness of the proposed evaluation methodologies in ranking MCSs for practical biological applications.</div></div><div><h3>Conclusion:</h3><div>The algorithms proposed in this paper offer an analytical approach for evaluating the quality of MCSs in biological systems. By providing criteria for MCS optimality, these approaches have potential to enhance the utility of MCS analysis in systems biology, aiding in the understanding and manipulation of complex biological networks.</div><div>Algorithm are implemented within Holmes software, an open-source project available at <span><span>https://github.com/bszawulak/HolmesPN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104740"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142501142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-Kai Zheng , Bi Zeng , Yi-Chun Feng , Lu Zhou , Yi-Xue Li
{"title":"PLRTE: Progressive learning for biomedical relation triplet extraction using large language models","authors":"Yi-Kai Zheng , Bi Zeng , Yi-Chun Feng , Lu Zhou , Yi-Xue Li","doi":"10.1016/j.jbi.2024.104738","DOIUrl":"10.1016/j.jbi.2024.104738","url":null,"abstract":"<div><div>Document-level relation triplet extraction is crucial in biomedical text mining, aiding in drug discovery and the construction of biomedical knowledge graphs. Current language models face challenges in generalizing to unseen datasets and relation types in biomedical relation triplet extraction, which limits their effectiveness in these crucial tasks. To address this challenge, our study optimizes models from two critical dimensions: data-task relevance and granularity of relations, aiming to enhance their generalization capabilities significantly. We introduce a novel progressive learning strategy to obtain the PLRTE model. This strategy not only enhances the model’s capability to comprehend diverse relation types in the biomedical domain but also implements a structured four-level progressive learning process through semantic relation augmentation, compositional instruction, and dual-axis level learning. Our experiments on the DDI and BC5CDR document-level biomedical relation triplet datasets demonstrate a significant performance improvement of 5% to 20% over the current state-of-the-art baselines. Furthermore, our model exhibits exceptional generalization capabilities on the unseen Chemprot and GDA datasets, further validating the effectiveness of optimizing data-task association and relation granularity for enhancing model generalizability.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104738"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Meredith C.B. Adams , Colin Griffin , Hunter Adams , Stephen Bryant , Robert W. Hurley , Umit Topaloglu
{"title":"Adapting the open-source Gen3 platform and kubernetes for the NIH HEAL IMPOWR and MIRHIQL clinical trial data commons: Customization, cloud transition, and optimization","authors":"Meredith C.B. Adams , Colin Griffin , Hunter Adams , Stephen Bryant , Robert W. Hurley , Umit Topaloglu","doi":"10.1016/j.jbi.2024.104749","DOIUrl":"10.1016/j.jbi.2024.104749","url":null,"abstract":"<div><h3>Objective</h3><div>This study aims to provide the decision-making framework, strategies, and software used to successfully deploy the first combined chronic pain and opioid use data clinical trial data commons using the Gen3 platform.</div></div><div><h3>Materials and Methods</h3><div>The approach involved adapting the open-source Gen3 platform and Kubernetes for the needs of the NIH HEAL IMPOWR and MIRHIQL networks. Key steps included customizing the Gen3 architecture, transitioning from Amazon to Google Cloud, adapting data ingestion and harmonization processes, ensuring security and compliance for the Kubernetes environment, and optimizing performance and user experience.</div></div><div><h3>Results</h3><div>The primary result was a fully operational IMPOWR data commons built on Gen3. Key features include a modular architecture supporting diverse clinical trial data types, automated processes for data management, fine-grained access control and auditing, and researcher-friendly interfaces for data exploration and analysis.</div></div><div><h3>Discussion</h3><div>The successful development of the Wake Forest IDEA-CC data commons represents a significant milestone for chronic pain and addiction research. Harmonized, FAIR data from diverse studies can be discovered in a secure, scalable repository. Challenges remain in long-term maintenance and governance, but the commons provides a foundation for accelerating scientific progress. Key lessons learned include the importance of engaging both technical and domain experts, the need for flexible yet robust infrastructure, and the value of building on established open-source platforms.</div></div><div><h3>Conclusion</h3><div>The WF IDEA-CC Gen3 data commons demonstrates the feasibility and value of developing a shared data infrastructure for chronic pain and opioid use research. The lessons learned can inform similar efforts in other clinical domains.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104749"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Segundo, M. Far, C.I. Rodríguez-Casado, J.M. Elorza, J. Carrere-Molina, R. Mallol-Parera, M. Aragón
{"title":"A mother-child data linkage approach using data from the information system for the development of research in primary care (SIDIAP) in Catalonia","authors":"E. Segundo, M. Far, C.I. Rodríguez-Casado, J.M. Elorza, J. Carrere-Molina, R. Mallol-Parera, M. Aragón","doi":"10.1016/j.jbi.2024.104747","DOIUrl":"10.1016/j.jbi.2024.104747","url":null,"abstract":"<div><h3>Background</h3><div>Large-scale clinical databases containing routinely collected electronic health records (EHRs) data are a valuable source of information for research studies. For example, they can be used in pharmacoepidemiology studies to evaluate the effects of maternal medication exposure on neonatal and pediatric outcomes. Yet, this type of studies is infeasible without proper mother–child linkage.</div></div><div><h3>Methods</h3><div>We leveraged all eligible active records (N = 8,553,321) of the Information System for Research in Primary Care (SIDIAP) database. Mothers and infants were linked using a deterministic approach and linkage accuracy was evaluated in terms of the number of records from candidate mothers that failed to link. We validated the mother–child links identified by comparison of linked and unlinked records for both candidate mothers and descendants. Differences across these two groups were evaluated by means of effect size calculations instead of <em>p</em>-values. Overall, we described our data linkage process following the GUidance for Information about Linking Data sets (GUILD) principles.</div></div><div><h3>Results</h3><div>We were able to identify 744,763 unique mother–child relationships, linking 83.8 % candidate mothers with delivery dates within a period of 15 years. Of note, we provide a record-level category label used to derive a global confidence metric for the presented linkage process. Our validation analysis showed that the two groups were similar in terms of a number of aggregated attributes.</div></div><div><h3>Conclusions</h3><div>Complementing the SIDIAP database with mother–child links will allow clinical researchers to expand their epidemiologic studies with the ultimate goal of improving outcomes for pregnant women and their children. Importantly, the reported information at each step of the data linkage process will contribute to the validity of analyses and interpretation of results in future studies using this resource.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104747"},"PeriodicalIF":4.0,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Triple and quadruple optimization for feature selection in cancer biomarker discovery","authors":"L. Cattelani, V. Fortino","doi":"10.1016/j.jbi.2024.104736","DOIUrl":"10.1016/j.jbi.2024.104736","url":null,"abstract":"<div><div>The proliferation of omics data has advanced cancer biomarker discovery but often falls short in external validation, mainly due to a narrow focus on prediction accuracy that neglects clinical utility and validation feasibility. We introduce three- and four-objective optimization strategies based on genetic algorithms to identify clinically actionable biomarkers in omics studies, addressing classification tasks aimed at distinguishing hard-to-differentiate cancer subtypes beyond histological analysis alone. Our hypothesis is that by optimizing more than one characteristic of cancer biomarkers, we may identify biomarkers that will enhance their success in external validation. Our objectives are to: (i) assess the biomarker panel’s accuracy using a machine learning (ML) framework; (ii) ensure the biomarkers exhibit significant fold-changes across subtypes, thereby boosting the success rate of PCR or immunohistochemistry validations; (iii) select a concise set of biomarkers to simplify the validation process and reduce clinical costs; and (iv) identify biomarkers crucial for predicting overall survival, which plays a significant role in determining the prognostic value of cancer subtypes. We implemented and applied triple and quadruple optimization algorithms to renal carcinoma gene expression data from TCGA. The study targets kidney cancer subtypes that are difficult to distinguish through histopathology methods. Selected RNA-seq biomarkers were assessed against the gold standard method, which relies solely on clinical information, and in external microarray-based validation datasets. Notably, these biomarkers achieved over 0.8 of accuracy in external validations and added significant value to survival predictions, outperforming the use of clinical data alone with a superior c-index. The provided tool also helps explore the trade-off between objectives, offering multiple solutions for clinical evaluation before proceeding to costly validation or clinical trials.</div></div>","PeriodicalId":15263,"journal":{"name":"Journal of Biomedical Informatics","volume":"159 ","pages":"Article 104736"},"PeriodicalIF":4.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142466411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}