Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang
{"title":"通过一种新的学习方法提高纵向数据的k近邻回归性能。","authors":"Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang","doi":"10.1186/s12859-025-06205-1","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).</p><p><strong>Methods: </strong>In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.</p><p><strong>Result: </strong>Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).</p><p><strong>Conclusion: </strong>The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"232"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482645/pdf/","citationCount":"0","resultStr":"{\"title\":\"Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach.\",\"authors\":\"Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang\",\"doi\":\"10.1186/s12859-025-06205-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).</p><p><strong>Methods: </strong>In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.</p><p><strong>Result: </strong>Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).</p><p><strong>Conclusion: </strong>The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.</p>\",\"PeriodicalId\":8958,\"journal\":{\"name\":\"BMC Bioinformatics\",\"volume\":\"26 1\",\"pages\":\"232\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482645/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12859-025-06205-1\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06205-1","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach.
Background: Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).
Methods: In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.
Result: Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).
Conclusion: The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.
期刊介绍:
BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology.
BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.