{"title":"Application and clinical utility assessment of natural language processing-based software for copy-number variants interpretation.","authors":"Songchang Chen, Chang Liu, Xiaorui Luan, Yuling Wang, Yuexin Xu, Yunshuang Li, Fenjiao Zhang, Weihui Shi, Xuanyou Zhou, Chenming Xu","doi":"10.1186/s12967-025-07063-4","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Manual interpretation of copy-number variant (CNV) according to the guideline published by the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resources (ClinGen) in 2020 is labor-intensive and time-consuming. The application of natural language processing (NLP)-based software like CNVisi can reduce the burden of CNV interpretation, but its clinical utility needs to be further evaluated.</p><p><strong>Methods: </strong>We firstly used 1000 CNVs which had been previously manually classified to assess the performance of CNVisi. To assess the clinical utility of CNVisi, we collected 5861 CNVs from 2443 next-generation sequencing (NGS)-based CNV sequencing (CNV-seq) samples. The CNVs were first classified by CNVisi and then reviewed by genetic experts. After removing duplicates, the remaining 3384 CNVs were used for assessment of classification consistency, and 154 CNVs that met the reporting rules were finally selected for further analysis.</p><p><strong>Results: </strong>The overall accuracy of CNVisi in distinguishing pCNVs (Pathogenic or Likely Pathogenic CNVs) was 97.7% (977/1000) in preliminary assessment of performance. And the accuracy of CNVisi in assessment of clinical utility was 99.6% (3370/3384). Among 154 CNVs that met clinical reporting rules, 23 CNVs were classified with disagreement between CNVisi and genetic experts. The inconsistency in classification is mainly caused by the overlap between CNV and low-penetrance regions, and the difference in scoring of evidence related to the literature. According to the reporting rules, total CNVs were classified with a high consistency of 98.6% (5781/5861) between genetic experts and CNVisi, and the CNV-seq results of 96.9% (2367/2443) samples could be accurately and efficiently interpreted by CNVisi. Furthermore, CNVisi was superior to previous tools for CNV interpretation and classification, and showed excellent clinical utility.</p><p><strong>Conclusions: </strong>Applying CNV interpretation software such as CNVisi with clinical utility can reduce the burden of genetic experts and improve the efficiency of CNV interpretation.</p>","PeriodicalId":17458,"journal":{"name":"Journal of Translational Medicine","volume":"23 1","pages":"1052"},"PeriodicalIF":7.5000,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12495639/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Translational Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12967-025-07063-4","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Manual interpretation of copy-number variant (CNV) according to the guideline published by the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resources (ClinGen) in 2020 is labor-intensive and time-consuming. The application of natural language processing (NLP)-based software like CNVisi can reduce the burden of CNV interpretation, but its clinical utility needs to be further evaluated.
Methods: We firstly used 1000 CNVs which had been previously manually classified to assess the performance of CNVisi. To assess the clinical utility of CNVisi, we collected 5861 CNVs from 2443 next-generation sequencing (NGS)-based CNV sequencing (CNV-seq) samples. The CNVs were first classified by CNVisi and then reviewed by genetic experts. After removing duplicates, the remaining 3384 CNVs were used for assessment of classification consistency, and 154 CNVs that met the reporting rules were finally selected for further analysis.
Results: The overall accuracy of CNVisi in distinguishing pCNVs (Pathogenic or Likely Pathogenic CNVs) was 97.7% (977/1000) in preliminary assessment of performance. And the accuracy of CNVisi in assessment of clinical utility was 99.6% (3370/3384). Among 154 CNVs that met clinical reporting rules, 23 CNVs were classified with disagreement between CNVisi and genetic experts. The inconsistency in classification is mainly caused by the overlap between CNV and low-penetrance regions, and the difference in scoring of evidence related to the literature. According to the reporting rules, total CNVs were classified with a high consistency of 98.6% (5781/5861) between genetic experts and CNVisi, and the CNV-seq results of 96.9% (2367/2443) samples could be accurately and efficiently interpreted by CNVisi. Furthermore, CNVisi was superior to previous tools for CNV interpretation and classification, and showed excellent clinical utility.
Conclusions: Applying CNV interpretation software such as CNVisi with clinical utility can reduce the burden of genetic experts and improve the efficiency of CNV interpretation.
期刊介绍:
The Journal of Translational Medicine is an open-access journal that publishes articles focusing on information derived from human experimentation to enhance communication between basic and clinical science. It covers all areas of translational medicine.