通过一种新的学习方法提高纵向数据的k近邻回归性能。

IF 3.3 3区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

BMC Bioinformatics Pub Date : 2025-09-30 DOI:10.1186/s12859-025-06205-1

Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang

{"title":"通过一种新的学习方法提高纵向数据的k近邻回归性能。","authors":"Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang","doi":"10.1186/s12859-025-06205-1","DOIUrl":null,"url":null,"abstract":"Background: Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).Methods: In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.Result: Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).Conclusion: The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"232"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482645/pdf/","citationCount":"0","resultStr":"{\"title\":\"Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach.\",\"authors\":\"Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang\",\"doi\":\"10.1186/s12859-025-06205-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).Methods: In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.Result: Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).Conclusion: The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"232\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482645/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06205-1\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06205-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

背景：纵向研究通常需要灵活的方法来预测基于时间依赖和时间独立协变量的反应轨迹。为了解决纵向数据的复杂性，本研究提出了k -最近邻（KNN）回归的一种新扩展，称为基于聚类的纵向数据KNN回归（CKNNRLD）。方法：在CKNNRLD中，首先使用KML算法（纵向数据的K-means）对数据进行聚类，然后在相关聚类中搜索最近邻，而不是在整个数据集中搜索。CKNNRLD的理论框架是通过广泛的模拟研究来发展和评估的。最后，将该方法应用于真实的纵向肺活量测量数据集。结果：与标准KNN相比，CKNNRLD预测精度提高，执行时间缩短，计算量减少。根据模拟结果，与使用KNN实现相比，使用CKNNRLD方法用于此目的所需的时间更少（对于N bbbb100）。与等效算法相比，该算法对纵向响应的预测更为准确。例如，在N = 2000、T = 5、D = 2、C = 4、E = 1和R = 1的场景中，CKNNRLD的执行时间大约比典型KNN的执行时间快3.7倍。由于KNN方法需要所有的训练数据来识别最近的邻居，随着纵向研究中个体数量的增加，它往往会运行缓慢（对于N bb0 500）。结论：与传统的KNN方法相比，CKNNRLD算法显著提高了纵向响应预测的准确性和计算效率。这些发现突出了它作为具有大型纵向数据集的研究人员的有价值工具的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach.

Background: Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).

Methods: In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.

Result: Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).

Conclusion: The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

BMC Bioinformatics 生物-生化研究方法

CiteScore

5.70

自引率

3.30%

发文量

506

审稿时长

4.3 months

期刊介绍： BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.