Guy Lutsker, Gal Sapir, Anastasia Godneva, Smadar Shilo, Jerry R Greenfield, Dorit Samocha-Bonet, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, Eran Segal
{"title":"From Glucose Patterns to Health Outcomes: A Generalizable Foundation Model for Continuous Glucose Monitor Data Analysis","authors":"Guy Lutsker, Gal Sapir, Anastasia Godneva, Smadar Shilo, Jerry R Greenfield, Dorit Samocha-Bonet, Shie Mannor, Eli Meirom, Gal Chechik, Hagai Rossman, Eran Segal","doi":"arxiv-2408.11876","DOIUrl":null,"url":null,"abstract":"Recent advances in self-supervised learning enabled novel medical AI models,\nknown as foundation models (FMs) that offer great potential for characterizing\nhealth from diverse biomedical data. Continuous glucose monitoring (CGM)\nprovides rich, temporal data on glycemic patterns, but its full potential for\npredicting broader health outcomes remains underutilized. Here, we present\nGluFormer, a generative foundation model on biomedical temporal data based on a\ntransformer architecture, and trained on over 10 million CGM measurements from\n10,812 non-diabetic individuals. We tokenized the CGM training data and trained\nGluFormer using next token prediction in a generative, autoregressive manner.\nWe demonstrate that GluFormer generalizes effectively to 15 different external\ndatasets, including 4936 individuals across 5 different geographical regions, 6\ndifferent CGM devices, and several metabolic disorders, including\nnormoglycemic, prediabetic, and diabetic populations, as well as those with\ngestational diabetes and obesity. GluFormer produces embeddings which\noutperform traditional CGM analysis tools, and achieves high Pearson\ncorrelations in predicting clinical parameters such as HbA1c, liver-related\nparameters, blood lipids, and sleep-related indices. Notably, GluFormer can\nalso predict onset of future health outcomes even 4 years in advance. We also\nshow that CGM embeddings from pre-intervention periods in Randomized Clinical\nTrials (RCTs) outperform other methods in predicting primary and secondary\noutcomes. When integrating dietary data into GluFormer, we show that the\nenhanced model can accurately generate CGM data based only on dietary intake\ndata, simulate outcomes of dietary interventions, and predict individual\nresponses to specific foods. Overall, we show that GluFormer accurately\npredicts health outcomes which generalize across different populations\nmetabolic conditions.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Quantitative Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recent advances in self-supervised learning enabled novel medical AI models,
known as foundation models (FMs) that offer great potential for characterizing
health from diverse biomedical data. Continuous glucose monitoring (CGM)
provides rich, temporal data on glycemic patterns, but its full potential for
predicting broader health outcomes remains underutilized. Here, we present
GluFormer, a generative foundation model on biomedical temporal data based on a
transformer architecture, and trained on over 10 million CGM measurements from
10,812 non-diabetic individuals. We tokenized the CGM training data and trained
GluFormer using next token prediction in a generative, autoregressive manner.
We demonstrate that GluFormer generalizes effectively to 15 different external
datasets, including 4936 individuals across 5 different geographical regions, 6
different CGM devices, and several metabolic disorders, including
normoglycemic, prediabetic, and diabetic populations, as well as those with
gestational diabetes and obesity. GluFormer produces embeddings which
outperform traditional CGM analysis tools, and achieves high Pearson
correlations in predicting clinical parameters such as HbA1c, liver-related
parameters, blood lipids, and sleep-related indices. Notably, GluFormer can
also predict onset of future health outcomes even 4 years in advance. We also
show that CGM embeddings from pre-intervention periods in Randomized Clinical
Trials (RCTs) outperform other methods in predicting primary and secondary
outcomes. When integrating dietary data into GluFormer, we show that the
enhanced model can accurately generate CGM data based only on dietary intake
data, simulate outcomes of dietary interventions, and predict individual
responses to specific foods. Overall, we show that GluFormer accurately
predicts health outcomes which generalize across different populations
metabolic conditions.