Aggregating Multiple Biological Measurements Per Patient

2010 Ninth International Conference on Machine Learning and Applications Pub Date : 2010-12-12 DOI:10.1109/ICMLA.2010.120

V. Zubek, F. Khan

引用次数: 2

Abstract

Many machine learning algorithms require a single value per feature per record for modeling. However, there are applications, in the medical domain particularly, where a single record may have multiple observations for the same feature. For example, a patient could have the same gene analyzed in multiple tissue slides of a biopsy, or could have the same genetic test performed on multiple subsequent biopsies. The challenge in these applications is how to integrate multiple observations of the same predictor feature per record. In this paper, two data aggregation methods are compared, one method is a simple median aggregation of feature values, while the other is a novel method which constructs intervals of values for each feature. The aggregated features are passed as input to a novel support vector regression method for modeling survival data in a prostate cancer setting. The performance of both methods was similar in predicting prostate cancer progression on three data cohorts.

查看原文本刊更多论文

汇总每位患者的多项生物学测量

许多机器学习算法需要每个特征每个记录的单个值进行建模。然而，在某些应用中，特别是在医学领域，单个记录可能对同一特征有多个观察结果。例如，患者可以在活检的多个组织切片中分析相同的基因，或者可以在随后的多个活检中进行相同的基因检测。这些应用程序中的挑战是如何将每个记录的相同预测器特征的多个观察结果集成在一起。本文比较了两种数据聚合方法，一种是简单的特征值中值聚合，另一种是为每个特征构造值区间的新方法。将聚合的特征作为输入传递给一种新的支持向量回归方法，用于在前列腺癌设置中建模生存数据。在三个数据队列中，两种方法在预测前列腺癌进展方面的表现相似。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Ninth International Conference on Machine Learning and Applications

自引率

0.00%

发文量