{"title":"Aggregating Multiple Biological Measurements Per Patient","authors":"V. Zubek, F. Khan","doi":"10.1109/ICMLA.2010.120","DOIUrl":null,"url":null,"abstract":"Many machine learning algorithms require a single value per feature per record for modeling. However, there are applications, in the medical domain particularly, where a single record may have multiple observations for the same feature. For example, a patient could have the same gene analyzed in multiple tissue slides of a biopsy, or could have the same genetic test performed on multiple subsequent biopsies. The challenge in these applications is how to integrate multiple observations of the same predictor feature per record. In this paper, two data aggregation methods are compared, one method is a simple median aggregation of feature values, while the other is a novel method which constructs intervals of values for each feature. The aggregated features are passed as input to a novel support vector regression method for modeling survival data in a prostate cancer setting. The performance of both methods was similar in predicting prostate cancer progression on three data cohorts.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Ninth International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2010.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Many machine learning algorithms require a single value per feature per record for modeling. However, there are applications, in the medical domain particularly, where a single record may have multiple observations for the same feature. For example, a patient could have the same gene analyzed in multiple tissue slides of a biopsy, or could have the same genetic test performed on multiple subsequent biopsies. The challenge in these applications is how to integrate multiple observations of the same predictor feature per record. In this paper, two data aggregation methods are compared, one method is a simple median aggregation of feature values, while the other is a novel method which constructs intervals of values for each feature. The aggregated features are passed as input to a novel support vector regression method for modeling survival data in a prostate cancer setting. The performance of both methods was similar in predicting prostate cancer progression on three data cohorts.