Disease candidate genes prediction using positive labeled and unlabeled instances.

IF 2 4区医学 Q3 GENETICS & HEREDITY

BMC Medical Genomics Pub Date : 2025-04-16 DOI:10.1186/s12920-025-02109-4

Sepideh Molaei, Saeed Jalili

{"title":"Disease candidate genes prediction using positive labeled and unlabeled instances.","authors":"Sepideh Molaei, Saeed Jalili","doi":"10.1186/s12920-025-02109-4","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying disease genes and understanding their performance is critical in producing drugs for genetic diseases. Nowadays, laboratory approaches are not only used for disease gene identification but also using computational approaches like machine learning are becoming considerable for this purpose. In machine learning methods, researchers can only use two data types (disease genes and unknown genes) to predict disease candidate genes. Notably, there is no source for the negative data set. The proposed method is a two-step process: The first step is the extraction of reliable negative genes from a set of unlabeled genes by one-class learning and a filter based on distance indicators from known disease genes; this step is performed separately for each disease. The second step is the learning of a binary model using causing genes of each disease as a positive learning set and the reliable negative genes extracted from that disease. Each gene in the unlabeled gene's production and ranking step is assigned a normalized score using two filters and a learned model. Consequently, disease genes are predicted and ranked. The proposed method evaluation of various six diseases and Cancer class indicates better results than other studies.</p>","PeriodicalId":8915,"journal":{"name":"BMC Medical Genomics","volume":"18 1","pages":"73"},"PeriodicalIF":2.0000,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12004746/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Genomics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12920-025-02109-4","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying disease genes and understanding their performance is critical in producing drugs for genetic diseases. Nowadays, laboratory approaches are not only used for disease gene identification but also using computational approaches like machine learning are becoming considerable for this purpose. In machine learning methods, researchers can only use two data types (disease genes and unknown genes) to predict disease candidate genes. Notably, there is no source for the negative data set. The proposed method is a two-step process: The first step is the extraction of reliable negative genes from a set of unlabeled genes by one-class learning and a filter based on distance indicators from known disease genes; this step is performed separately for each disease. The second step is the learning of a binary model using causing genes of each disease as a positive learning set and the reliable negative genes extracted from that disease. Each gene in the unlabeled gene's production and ranking step is assigned a normalized score using two filters and a learned model. Consequently, disease genes are predicted and ranked. The proposed method evaluation of various six diseases and Cancer class indicates better results than other studies.

Abstract Image

查看原文本刊更多论文

利用阳性标记和未标记实例预测疾病候选基因。

识别疾病基因并了解它们的表现对于生产治疗遗传疾病的药物至关重要。如今，实验室方法不仅用于疾病基因鉴定，而且使用机器学习等计算方法也越来越多地用于此目的。在机器学习方法中，研究人员只能使用两种数据类型（疾病基因和未知基因）来预测疾病候选基因。值得注意的是，负数据集没有来源。该方法分为两步：第一步，通过一类学习和基于已知疾病基因距离指标的过滤，从一组未标记的基因中提取可靠的阴性基因；此步骤分别针对每种疾病执行。第二步是学习二元模型，使用每种疾病的致病基因作为正学习集，并从该疾病中提取可靠的负基因。在未标记基因的产生和排序步骤中，每个基因使用两个过滤器和一个学习模型分配一个标准化的分数。因此，疾病基因被预测和排序。本文提出的方法对六种疾病和肿瘤分类的评价结果优于其他研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Genomics 医学-遗传学

CiteScore

3.90

自引率

0.00%

发文量

243

审稿时长

3.5 months

期刊介绍： BMC Medical Genomics is an open access journal publishing original peer-reviewed research articles in all aspects of functional genomics, genome structure, genome-scale population genetics, epigenomics, proteomics, systems analysis, and pharmacogenomics in relation to human health and disease.