肺癌数据的机器学习分析

Proceedings of the West Virginia Academy of Science Pub Date : 2023-04-18 DOI:10.55632/pwvas.v95i2.974

Annalee Corcoran, Jason Rafe Miller

{"title":"肺癌数据的机器学习分析","authors":"Annalee Corcoran, Jason Rafe Miller","doi":"10.55632/pwvas.v95i2.974","DOIUrl":null,"url":null,"abstract":"Data preparation is a critical step for any machine learning experiment. We have analyzed a dataset derived from images of human male lung cancer tumors. These tumors had been analyzed with genetic markers to identify Y-chromosome loss, which was the case in about half of the samples. Whole slide images (WSI) had been collected and H&E stained by collaborators. We had processed the images with the CellProfiler software to extract numeric features. In this study, we analyzed the data in preparation for training a convolutional neural network to predict Y-chromosome loss from the extracted features, thereby recapitulating the genetic marker analysis. Using Excel and Python, we identified uninformative features and missing data. We predict that data cleaning, informed by these results, will improve the chances of successful machine learning.","PeriodicalId":92280,"journal":{"name":"Proceedings of the West Virginia Academy of Science","volume":"2014 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analyzing Lung Cancer Data for Machine Learning\",\"authors\":\"Annalee Corcoran, Jason Rafe Miller\",\"doi\":\"10.55632/pwvas.v95i2.974\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data preparation is a critical step for any machine learning experiment. We have analyzed a dataset derived from images of human male lung cancer tumors. These tumors had been analyzed with genetic markers to identify Y-chromosome loss, which was the case in about half of the samples. Whole slide images (WSI) had been collected and H&E stained by collaborators. We had processed the images with the CellProfiler software to extract numeric features. In this study, we analyzed the data in preparation for training a convolutional neural network to predict Y-chromosome loss from the extracted features, thereby recapitulating the genetic marker analysis. Using Excel and Python, we identified uninformative features and missing data. We predict that data cleaning, informed by these results, will improve the chances of successful machine learning.\",\"PeriodicalId\":92280,\"journal\":{\"name\":\"Proceedings of the West Virginia Academy of Science\",\"volume\":\"2014 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the West Virginia Academy of Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.55632/pwvas.v95i2.974\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the West Virginia Academy of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55632/pwvas.v95i2.974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据准备是任何机器学习实验的关键步骤。我们分析了一个来自人类男性肺癌肿瘤图像的数据集。这些肿瘤用遗传标记进行了分析，以确定y染色体的缺失，大约一半的样本都是这种情况。由合作者收集整张幻灯片图像(WSI)并进行H&E染色。我们使用CellProfiler软件对图像进行处理，提取数字特征。在本研究中，我们对数据进行分析，为训练卷积神经网络从提取的特征中预测y染色体丢失做准备，从而概括遗传标记分析。使用Excel和Python，我们确定了无信息的特征和缺失的数据。我们预测，根据这些结果，数据清理将提高机器学习成功的机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analyzing Lung Cancer Data for Machine Learning

Data preparation is a critical step for any machine learning experiment. We have analyzed a dataset derived from images of human male lung cancer tumors. These tumors had been analyzed with genetic markers to identify Y-chromosome loss, which was the case in about half of the samples. Whole slide images (WSI) had been collected and H&E stained by collaborators. We had processed the images with the CellProfiler software to extract numeric features. In this study, we analyzed the data in preparation for training a convolutional neural network to predict Y-chromosome loss from the extracted features, thereby recapitulating the genetic marker analysis. Using Excel and Python, we identified uninformative features and missing data. We predict that data cleaning, informed by these results, will improve the chances of successful machine learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the West Virginia Academy of Science

自引率

0.00%

发文量