{"title":"Finding Optimal Alphabet for Encoding Daily Continuous Glucose Monitoring Time Series Into Compressed Text.","authors":"Tobore Igbe, Boris Kovatchev","doi":"10.1177/19322968251323913","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.</p><p><strong>Methods: </strong>Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.</p><p><strong>Result: </strong>The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.</p><p><strong>Conclusion: </strong>Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.</p>","PeriodicalId":15475,"journal":{"name":"Journal of Diabetes Science and Technology","volume":" ","pages":"19322968251323913"},"PeriodicalIF":4.1000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924066/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Diabetes Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/19322968251323913","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0
Abstract
Background: The emergence of continuous glucose monitoring (CGM) devices has not only revolutionized diabetes management but has also opened new avenues for research. This article presents a novel approach to encoding a CGM daily profile into a CGM string and CGM text that preserves clinical metrics information but compresses the data.
Methods: Eight alphabets were defined to represent glucose ranges. The Akaike information criterion (AIC) was derived from error, and the compression ratio was estimated for each alphabet to determine the optimal alphabet for encoding the CGM daily profile. The analysis was done with data from six distinct studies, with different treatment modalities, applied to individuals with type 1 diabetes (T1D) or type 2 diabetes (T2D), and without diabetes. The data set was divided into 70% for training and 30% for validation.
Result: The result from the training data reveals that a 9-letter alphabet was optimal for encoding daily CGM profiles for T1D or T2D, yielding the lowest AIC score that minimizes information loss. However, in health, fewer letters were needed, and this is to be expected, given the lower variation of the data. Further testing with the Pearson correlation showed that the 9-letter alphabet approximated the coefficient of variation, with correlations between 0.945 and 0.965.
Conclusion: Encoding CGM data into text could enhance the classification of CGM profiles and enable the use of well-established search engines with CGM data. Other potential applications include predictive modeling, anomaly detection, indexing, trend analysis, or future generative artificial intelligence applications for diabetes research and clinical practice.
背景:连续血糖监测(CGM)设备的出现不仅彻底改变了糖尿病的管理,而且为研究开辟了新的途径。本文提出了一种将CGM日常档案编码为CGM字符串和CGM文本的新方法,该方法保留了临床指标信息,但压缩了数据。方法:用8个字母表示血糖范围。根据误差推导出赤池信息准则(Akaike information criterion, AIC),并对每个字母表进行压缩比估计,确定编码CGM日剖面的最佳字母表。分析数据来自6项不同的研究,采用不同的治疗方式,适用于1型糖尿病(T1D)或2型糖尿病(T2D)患者,以及非糖尿病患者。数据集分为70%用于训练,30%用于验证。结果:训练数据的结果显示,9个字母的字母表最适合编码T1D或T2D的日常CGM概况,产生最低的AIC分数,最大限度地减少信息损失。然而,在健康方面,需要更少的字母,这是意料之中的,因为数据的变化较小。进一步的Pearson相关检验表明,9个字母的字母表近似变异系数,相关系数在0.945和0.965之间。结论:将CGM数据编码为文本可以增强CGM数据的分类能力,并使CGM数据能够被完善的搜索引擎使用。其他潜在的应用包括预测建模、异常检测、索引、趋势分析,或未来生成人工智能在糖尿病研究和临床实践中的应用。
期刊介绍:
The Journal of Diabetes Science and Technology (JDST) is a bi-monthly, peer-reviewed scientific journal published by the Diabetes Technology Society. JDST covers scientific and clinical aspects of diabetes technology including glucose monitoring, insulin and metabolic peptide delivery, the artificial pancreas, digital health, precision medicine, social media, cybersecurity, software for modeling, physiologic monitoring, technology for managing obesity, and diagnostic tests of glycation. The journal also covers the development and use of mobile applications and wireless communication, as well as bioengineered tools such as MEMS, new biomaterials, and nanotechnology to develop new sensors. Articles in JDST cover both basic research and clinical applications of technologies being developed to help people with diabetes.