预测胃癌复发的自适应机器学习管道

Yifan Gao, Haoran Wang, Minhan Guo, Yajin Li
{"title":"预测胃癌复发的自适应机器学习管道","authors":"Yifan Gao, Haoran Wang, Minhan Guo, Yajin Li","doi":"10.1109/ISCTT51595.2020.00076","DOIUrl":null,"url":null,"abstract":"The advancement of medical science and technology has provided more methods for the diagnosis and treatment of malignant tumors, and the survival period of cancer patients has been significantly extended. However, many patients with malignant tumors still have recurrence and metastasis after effective treatment. Exploring the mechanism of tumor recurrence and metastasis to predict the recurrence and metastasis of cancer is a major clinical issue. At the same time, the rapid development of the Human Genome Project and gene microarray technology has enabled the activity of many genes in the patient's body to be intuitively measured through the chip. The rapid development of machine learning has contributed to the data mining and medical science of this DNA microarray technology. Therefore, this project aims at the above-mentioned problems and predicts the location and time of recurrence by analyzing a large amount of clinical data and 2enetic data. Firstly, we perform simple data cleaning and normalization processing on clinical data and genetic data; second, perform differential gene screening; next, select principal component analysis, sparse principal component analysis, nuclear principal component analysis, and multi-dimensional scaling algorithms to reduce the dimensional of the data. Finally, the genetic data uses random forest, support vector machine, linear support vector machine, guided aggregation algorithm, gradient boosting algorithm, and ensemble learning for machine learning, and then finds the best parameters and methods through grid search, and selects the appropriate model Evaluation method. The clinical data is manually selected and classified using machine learning. Finally, the results of clinical data and genetic data are combined to predict the site of recurrence. Using the above method to predict the recurrence and the location of the recurrence, a good effect was achieved. Taking whether to recurrence as an example, the accuracy rate of the verification set reached 0.825, the recall rate reached 0.801, the Fl score reached 0.800. Through the retrospective study and prediction of gastric cancer recurrence, the model proposed in this paper has potential clinical value.","PeriodicalId":178054,"journal":{"name":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An adaptive machine learning pipeline for predicting the recurrence of gastric cancer\",\"authors\":\"Yifan Gao, Haoran Wang, Minhan Guo, Yajin Li\",\"doi\":\"10.1109/ISCTT51595.2020.00076\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of medical science and technology has provided more methods for the diagnosis and treatment of malignant tumors, and the survival period of cancer patients has been significantly extended. However, many patients with malignant tumors still have recurrence and metastasis after effective treatment. Exploring the mechanism of tumor recurrence and metastasis to predict the recurrence and metastasis of cancer is a major clinical issue. At the same time, the rapid development of the Human Genome Project and gene microarray technology has enabled the activity of many genes in the patient's body to be intuitively measured through the chip. The rapid development of machine learning has contributed to the data mining and medical science of this DNA microarray technology. Therefore, this project aims at the above-mentioned problems and predicts the location and time of recurrence by analyzing a large amount of clinical data and 2enetic data. Firstly, we perform simple data cleaning and normalization processing on clinical data and genetic data; second, perform differential gene screening; next, select principal component analysis, sparse principal component analysis, nuclear principal component analysis, and multi-dimensional scaling algorithms to reduce the dimensional of the data. Finally, the genetic data uses random forest, support vector machine, linear support vector machine, guided aggregation algorithm, gradient boosting algorithm, and ensemble learning for machine learning, and then finds the best parameters and methods through grid search, and selects the appropriate model Evaluation method. The clinical data is manually selected and classified using machine learning. Finally, the results of clinical data and genetic data are combined to predict the site of recurrence. Using the above method to predict the recurrence and the location of the recurrence, a good effect was achieved. Taking whether to recurrence as an example, the accuracy rate of the verification set reached 0.825, the recall rate reached 0.801, the Fl score reached 0.800. Through the retrospective study and prediction of gastric cancer recurrence, the model proposed in this paper has potential clinical value.\",\"PeriodicalId\":178054,\"journal\":{\"name\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCTT51595.2020.00076\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCTT51595.2020.00076","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

医学科技的进步为恶性肿瘤的诊断和治疗提供了更多的方法,癌症患者的生存期明显延长。然而,许多恶性肿瘤患者在有效治疗后仍有复发和转移。探讨肿瘤复发和转移的机制,预测肿瘤的复发和转移是一个重大的临床课题。与此同时,人类基因组计划和基因微阵列技术的快速发展,使得通过芯片可以直观地测量患者体内许多基因的活性。机器学习的快速发展促进了DNA微阵列技术的数据挖掘和医学研究。因此,本项目针对上述问题,通过分析大量临床资料和遗传学资料,预测复发的部位和时间。首先,对临床数据和遗传数据进行简单的数据清洗和归一化处理;第二,进行差异基因筛选;其次,选择主成分分析、稀疏主成分分析、核主成分分析和多维尺度算法对数据进行降维处理。最后,对遗传数据采用随机森林、支持向量机、线性支持向量机、引导聚合算法、梯度增强算法、集成学习等方法进行机器学习,然后通过网格搜索找到最佳参数和方法,选择合适的模型评价方法。临床数据通过机器学习进行人工选择和分类。最后结合临床资料和遗传资料预测复发部位。采用上述方法预测复发及复发部位,取得了较好的效果。以是否复发为例,验证集的准确率达到0.825,召回率达到0.801,Fl得分达到0.800。通过对胃癌复发的回顾性研究和预测,本文提出的模型具有潜在的临床价值。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An adaptive machine learning pipeline for predicting the recurrence of gastric cancer
The advancement of medical science and technology has provided more methods for the diagnosis and treatment of malignant tumors, and the survival period of cancer patients has been significantly extended. However, many patients with malignant tumors still have recurrence and metastasis after effective treatment. Exploring the mechanism of tumor recurrence and metastasis to predict the recurrence and metastasis of cancer is a major clinical issue. At the same time, the rapid development of the Human Genome Project and gene microarray technology has enabled the activity of many genes in the patient's body to be intuitively measured through the chip. The rapid development of machine learning has contributed to the data mining and medical science of this DNA microarray technology. Therefore, this project aims at the above-mentioned problems and predicts the location and time of recurrence by analyzing a large amount of clinical data and 2enetic data. Firstly, we perform simple data cleaning and normalization processing on clinical data and genetic data; second, perform differential gene screening; next, select principal component analysis, sparse principal component analysis, nuclear principal component analysis, and multi-dimensional scaling algorithms to reduce the dimensional of the data. Finally, the genetic data uses random forest, support vector machine, linear support vector machine, guided aggregation algorithm, gradient boosting algorithm, and ensemble learning for machine learning, and then finds the best parameters and methods through grid search, and selects the appropriate model Evaluation method. The clinical data is manually selected and classified using machine learning. Finally, the results of clinical data and genetic data are combined to predict the site of recurrence. Using the above method to predict the recurrence and the location of the recurrence, a good effect was achieved. Taking whether to recurrence as an example, the accuracy rate of the verification set reached 0.825, the recall rate reached 0.801, the Fl score reached 0.800. Through the retrospective study and prediction of gastric cancer recurrence, the model proposed in this paper has potential clinical value.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信