基于机器学习的甲基化数据特征还原方法,用于癌症组织来源分类

IF 2.4 3区 医学 Q3 ONCOLOGY
Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio
{"title":"基于机器学习的甲基化数据特征还原方法,用于癌症组织来源分类","authors":"Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio","doi":"10.1007/s10147-024-02617-w","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.</p>","PeriodicalId":13869,"journal":{"name":"International Journal of Clinical Oncology","volume":"102 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin\",\"authors\":\"Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio\",\"doi\":\"10.1007/s10147-024-02617-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Background</h3><p>Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.</p><h3 data-test=\\\"abstract-sub-heading\\\">Methods</h3><p>Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.</p><h3 data-test=\\\"abstract-sub-heading\\\">Results</h3><p>This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.</p><h3 data-test=\\\"abstract-sub-heading\\\">Conclusions</h3><p>Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.</p>\",\"PeriodicalId\":13869,\"journal\":{\"name\":\"International Journal of Clinical Oncology\",\"volume\":\"102 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Clinical Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10147-024-02617-w\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Clinical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10147-024-02617-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景基因组DNA甲基化图谱分析是一种很有前景但成本高昂的癌症分类方法,涉及大量数据。方法分析 TCGA 数据库中 10 种癌症类型 890 个样本的甲基化数据,利用方差分析和增益比来选择最重要的 CpG 位点,然后利用梯度提升法将这些位点减少到 100 个。这种方法在不降低性能的前提下有效地减少了所需特征的数量,有助于对原发器官进行分类,并发现乳腺癌和肺癌等特定癌症中的亚组。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin

A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin

Background

Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.

Methods

Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.

Results

This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.

Conclusions

Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.80
自引率
3.00%
发文量
175
审稿时长
2 months
期刊介绍: The International Journal of Clinical Oncology (IJCO) welcomes original research papers on all aspects of clinical oncology that report the results of novel and timely investigations. Reports on clinical trials are encouraged. Experimental studies will also be accepted if they have obvious relevance to clinical oncology. Membership in the Japan Society of Clinical Oncology is not a prerequisite for submission to the journal. Papers are received on the understanding that: their contents have not been published in whole or in part elsewhere; that they are subject to peer review by at least two referees and the Editors, and to editorial revision of the language and contents; and that the Editors are responsible for their acceptance, rejection, and order of publication.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信