Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework.

bioRxiv : the preprint server for biology Pub Date : 2025-02-20 DOI:10.1101/2024.06.13.598672

Yasemin Bridges, Vinicius de Souza, Katherina G Cortes, Melissa Haendel, Nomi L Harris, Daniel R Korn, Nikolaos M Marinakis, Nicolas Matentzoglu, James A McLaughlin, Christopher J Mungall, Aaron Odell, David Osumi-Sutherland, Peter N Robinson, Damian Smedley, Julius Ob Jacobsen

{"title":"Towards a standard benchmark for phenotype-driven variant and gene prioritisation algorithms: PhEval - Phenotypic inference Evaluation framework.","authors":"Yasemin Bridges, Vinicius de Souza, Katherina G Cortes, Melissa Haendel, Nomi L Harris, Daniel R Korn, Nikolaos M Marinakis, Nicolas Matentzoglu, James A McLaughlin, Christopher J Mungall, Aaron Odell, David Osumi-Sutherland, Peter N Robinson, Damian Smedley, Julius Ob Jacobsen","doi":"10.1101/2024.06.13.598672","DOIUrl":null,"url":null,"abstract":"Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools.Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets.Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.","PeriodicalId":72407,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11195176/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv : the preprint server for biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.06.13.598672","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Computational approaches to support rare disease diagnosis are challenging to build, requiring the integration of complex data types such as ontologies, gene-to-phenotype associations, and cross-species data into variant and gene prioritisation algorithms (VGPAs). However, the performance of VGPAs has been difficult to measure and is impacted by many factors, for example, ontology structure, annotation completeness or changes to the underlying algorithm. Assertions of the capabilities of VGPAs are often not reproducible, in part because there is no standardised, empirical framework and openly available patient data to assess the efficacy of VGPAs - ultimately hindering the development of effective prioritisation tools.

Results: In this paper, we present our benchmarking tool, PhEval, which aims to provide a standardised and empirical framework to evaluate phenotype-driven VGPAs. The inclusion of standardised test corpora and test corpus generation tools in the PhEval suite of tools allows open benchmarking and comparison of methods on standardised data sets.

Conclusions: PhEval and the standardised test corpora solve the issues of patient data availability and experimental tooling configuration when benchmarking and comparing rare disease VGPAs. By providing standardised data on patient cohorts from real-world case-reports and controlling the configuration of evaluated VGPAs, PhEval enables transparent, portable, comparable and reproducible benchmarking of VGPAs. As these tools are often a key component of many rare disease diagnostic pipelines, a thorough and standardised method of assessment is essential for improving patient diagnosis and care.

查看原文本刊更多论文

为变异和基因优先算法制定标准基准：PhEval - 表型推断评估框架。

背景：建立支持罕见病诊断的计算方法具有挑战性，需要将本体、基因与表型关联以及跨物种数据等复杂数据类型整合到变异体和基因优先排序算法（VGPA）中。然而，VGPA 的性能一直难以衡量，而且受到许多因素的影响，例如本体结构、注释完整性或底层算法的变化。对VGPA能力的断言往往不可复制，部分原因是没有标准化的实证框架和公开可用的患者数据来评估VGPA的功效--这最终阻碍了有效的优先排序工具的开发：在本文中，我们介绍了我们的基准工具 PhEval，该工具旨在提供一个标准化的经验框架来评估表型驱动的 VGPA。PhEval工具套件中包含了标准化测试语料库和测试语料库生成工具，可在标准化数据集上对各种方法进行公开基准测试和比较：结论：PhEval 和标准化测试语料库解决了在对罕见病 VGPA 进行基准测试和比较时患者数据可用性和实验工具配置的问题。通过提供来自真实世界病例报告的标准化患者队列数据并控制所评估 VGPA 的配置，PhEval 实现了 VGPA 透明、可移植、可比较和可重复的基准测试。由于这些工具通常是许多罕见病诊断管道的关键组成部分，因此全面和标准化的评估方法对于改善患者诊断和护理至关重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

bioRxiv : the preprint server for biology

自引率

0.00%

发文量