Multivariate functional partial least squares for classification using longitudinal data.

IF 1.2 4区生物学 Q4 BIOLOGY

Theoretical Biology Forum Pub Date : 2021-01-01 DOI:10.19272/202111402007

Sonia Dembowska, Alejandro F Frangi, Jeanine Houwing-Duistermaat, Haiyan Liu

{"title":"Multivariate functional partial least squares for classification using longitudinal data.","authors":"Sonia Dembowska, Alejandro F Frangi, Jeanine Houwing-Duistermaat, Haiyan Liu","doi":"10.19272/202111402007","DOIUrl":null,"url":null,"abstract":"The use of statistical methods to predict outcomes using high dimensional datasets in medicine is becoming increasingly popular for forecasting and monitoring patient health. Our work is motivated by a longitudinal dataset containing 1H NMR spectra of metabolites of 18 patients undergoing a kidney transplant alongside their graft outcomes that fall into one of three categories: acute rejection, delayed graft function and primary function. We proposed a functional partial least squares (FPLS) model that extends existing PLS methods for the analysis of longitudinally measured scalar omics datasets to the case of longitudinally measured functional datasets. We designed an iterative algorithm to link multiple time points, and then applied our proposed method to analyse the data from kidney transplant patients. Finally, we compared the AUC of our method to the AUC of the univariate methods which only use the information of one time-point information. It appeared that our method outperforms the existing methods. A simulation study was performed to mimic the kidney transplant dataset but with a larger sample size and different scenarios performed to evaluate the performance of the new method in larger datasets. We consider scenarios which vary in the difficulty to distinguish the two groups. It appeared that the three time-points model performs better than any of the individual models with average AUCs of 0.909 and 0.811 respectively.","PeriodicalId":55980,"journal":{"name":"Theoretical Biology Forum","volume":"114 1-2 1","pages":"75-88"},"PeriodicalIF":1.2000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Biology Forum","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.19272/202111402007","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 1

Abstract

The use of statistical methods to predict outcomes using high dimensional datasets in medicine is becoming increasingly popular for forecasting and monitoring patient health. Our work is motivated by a longitudinal dataset containing 1H NMR spectra of metabolites of 18 patients undergoing a kidney transplant alongside their graft outcomes that fall into one of three categories: acute rejection, delayed graft function and primary function. We proposed a functional partial least squares (FPLS) model that extends existing PLS methods for the analysis of longitudinally measured scalar omics datasets to the case of longitudinally measured functional datasets. We designed an iterative algorithm to link multiple time points, and then applied our proposed method to analyse the data from kidney transplant patients. Finally, we compared the AUC of our method to the AUC of the univariate methods which only use the information of one time-point information. It appeared that our method outperforms the existing methods. A simulation study was performed to mimic the kidney transplant dataset but with a larger sample size and different scenarios performed to evaluate the performance of the new method in larger datasets. We consider scenarios which vary in the difficulty to distinguish the two groups. It appeared that the three time-points model performs better than any of the individual models with average AUCs of 0.909 and 0.811 respectively.

查看原文本刊更多论文

多元泛函偏最小二乘分类使用纵向数据。

在医学中，使用统计方法预测使用高维数据集的结果在预测和监测患者健康方面越来越受欢迎。我们的工作是由一个纵向数据集激发的，该数据集包含18名接受肾移植的患者的代谢物的1H NMR光谱，以及他们的移植结果，这些移植结果属于三类之一:急性排斥反应，移植功能延迟和主要功能。我们提出了一个功能偏最小二乘(FPLS)模型，将现有的用于纵向测量标量组学数据集分析的PLS方法扩展到纵向测量功能数据集的情况。我们设计了一个迭代算法来链接多个时间点，然后将我们提出的方法应用于肾移植患者的数据分析。最后，将该方法的AUC与仅使用一个时间点信息的单变量方法的AUC进行了比较。看来我们的方法优于现有的方法。为了模拟肾脏移植数据集，进行了一项模拟研究，但样本量更大，并进行了不同的场景，以评估新方法在更大数据集中的性能。我们考虑不同难度的场景来区分这两组。结果表明，三个时间点模型的平均auc分别为0.909和0.811，优于任何单个模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Theoretical Biology Forum Agricultural and Biological Sciences-General Agricultural and Biological Sciences

CiteScore

1.10

自引率

0.00%

发文量