Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio
{"title":"基于机器学习的甲基化数据特征还原方法,用于癌症组织来源分类","authors":"Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio","doi":"10.1007/s10147-024-02617-w","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Background</h3><p>Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.</p><h3 data-test=\"abstract-sub-heading\">Methods</h3><p>Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.</p><h3 data-test=\"abstract-sub-heading\">Results</h3><p>This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.</p><h3 data-test=\"abstract-sub-heading\">Conclusions</h3><p>Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.</p>","PeriodicalId":13869,"journal":{"name":"International Journal of Clinical Oncology","volume":"102 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin\",\"authors\":\"Marco A. De Velasco, Kazuko Sakai, Seiichiro Mitani, Yurie Kura, Shuji Minamoto, Takahiro Haeno, Hidetoshi Hayashi, Kazuto Nishio\",\"doi\":\"10.1007/s10147-024-02617-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Background</h3><p>Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.</p><h3 data-test=\\\"abstract-sub-heading\\\">Methods</h3><p>Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.</p><h3 data-test=\\\"abstract-sub-heading\\\">Results</h3><p>This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.</p><h3 data-test=\\\"abstract-sub-heading\\\">Conclusions</h3><p>Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.</p>\",\"PeriodicalId\":13869,\"journal\":{\"name\":\"International Journal of Clinical Oncology\",\"volume\":\"102 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Clinical Oncology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s10147-024-02617-w\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ONCOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Clinical Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s10147-024-02617-w","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ONCOLOGY","Score":null,"Total":0}
A machine learning-based method for feature reduction of methylation data for the classification of cancer tissue origin
Background
Genome DNA methylation profiling is a promising yet costly method for cancer classification, involving substantial data. We developed an ensemble learning model to identify cancer types using methylation profiles from a limited number of CpG sites.
Methods
Analyzing methylation data from 890 samples across 10 cancer types from the TCGA database, we utilized ANOVA and Gain Ratio to select the most significant CpG sites, then employed Gradient Boosting to reduce these to just 100 sites.
Results
This approach maintained high accuracy across multiple machine learning models, with classification accuracy rates between 87.7% and 93.5% for methods including Extreme Gradient Boosting, CatBoost, and Random Forest. This method effectively minimizes the number of features needed without losing performance, helping to classify primary organs and uncover subgroups within specific cancers like breast and lung.
Conclusions
Using a gradient boosting feature selector shows potential for streamlining methylation-based cancer classification.
期刊介绍:
The International Journal of Clinical Oncology (IJCO) welcomes original research papers on all aspects of clinical oncology that report the results of novel and timely investigations. Reports on clinical trials are encouraged. Experimental studies will also be accepted if they have obvious relevance to clinical oncology. Membership in the Japan Society of Clinical Oncology is not a prerequisite for submission to the journal. Papers are received on the understanding that: their contents have not been published in whole or in part elsewhere; that they are subject to peer review by at least two referees and the Editors, and to editorial revision of the language and contents; and that the Editors are responsible for their acceptance, rejection, and order of publication.