{"title":"Advancing Precision Healthcare: A Comparative Study of Machine Learning Approaches for Diabetes Classification","authors":"Xiaohua Li, Yue Teng","doi":"10.1007/s40745-025-00627-7","DOIUrl":"10.1007/s40745-025-00627-7","url":null,"abstract":"<div><p>This report underscores the supreme importance of diabetes classification in the medical field, a precursor to personalized treatment protocols and best disease control. Several methods have been described in the literature for classifying diabetes, with increasing focus in more recent times on machine-learning algorithms for analyzing medical records. With as much sincerity as previous research, the quest for the optimal method continues to provoke further investigations. This study investigates the experimentation of various machine-learning classification models to create stable and efficient models for diabetes classification. Creation of varied models from a diabetic clinical data set with strict validation, testing, and training protocols to determine efficiency. Experimental results and performance analyses are conducted to determine the models and compare performances using standard evaluation measures. Mass experiments demonstrate the technique’s efficacy, with accurate results that outdo existing methods.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"13 1","pages":"125 - 142"},"PeriodicalIF":0.0,"publicationDate":"2025-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147342961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bivariate semiparametric mixed models to longitudinally measured systolic and diastolic blood pressure of adult diabetic patients","authors":"Tafere Tilahun Aniley, Legesse Kassa Debusho, Tadele Akeba Diriba","doi":"10.1007/s40745-025-00630-y","DOIUrl":"10.1007/s40745-025-00630-y","url":null,"abstract":"<div><p>In biomedical research, physicians collect systolic and diastolic blood pressures simultaneously when a patient visits a clinic for treatment and serve as indicators of the patient’s health status. Elevated blood pressure is a common comorbidity in diabetes patients and can increase the risk of developing hypertension over time. The aim of this study was to examine the joint evolution of systolic and diastolic blood pressure and estimate the rate of changes over time. In the analysis of correlated multiple outcomes, multivariate analysis yields satisfactory results compared to univariate analysis. Since the individual and mean profiles of systolic and diastolic blood pressure are nonlinear, we proposed bivariate semiparametric mixed models accounting for the correlation through joint random effects. Smoothing splines and thin-plate splines were specified to capture the nonlinear trends of systolic and diastolic blood pressure overtime and the nonlinear interaction effects between covariates, respectively. The bivariate semiparametric mixed models had a better fit with the data than the parametric counterparts. The study revealed a nonlinear association between weight and age with both systolic and diastolic blood pressure of diabetic patients. The results showed that weight had a more pronounced effect on increasing systolic and diastolic blood pressure in adult diabetic patients. There was a strong association between systolic and diastolic blood pressure, while the rate of change decreases with time. The proposed method may help physicians to monitor the blood pressure of patients regularly and hence to identify periods of changes early and to manage them effectively.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"13 1","pages":"163 - 190"},"PeriodicalIF":0.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-025-00630-y.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthew I. Ogbuagu, Saibu M. Olufemi, Ogunniyi B. Matthew
{"title":"Financial Flows, Economic Integration and Macroeconomic Convergence in Africa: An Interactive and Threshold Effects","authors":"Matthew I. Ogbuagu, Saibu M. Olufemi, Ogunniyi B. Matthew","doi":"10.1007/s40745-025-00631-x","DOIUrl":"10.1007/s40745-025-00631-x","url":null,"abstract":"<div><p>This study investigates the impacts of financial flows and economic integration on macroeconomic convergence by analysing 37 African countries over the period 1994 to 2021. The Augmented Neoclassical theory serves as the theoretical foundation of the study. The analysis employs the Fully Modified Ordinary Least Squares (FMOLS) technique along with other non-parametric methods. The findings confirm both beta and sigma-conditional convergence across Africa. Furthermore, financial flows and economic integration exert mixed effects on economic performance while jointly accelerating macroeconomic convergence. The results also suggest that financial flows and economic integration collectively enhance convergence toward the steady-state equilibrium in Africa. The estimated thresholds of foreign direct investment (FDI), official development assistance (ODA), and remittance inflows required to accelerate convergence are 3.44%, 1.43%, and 2.09%, respectively. Given a convergence speed of 5.3% and a time frame of 25 years, Africa is unlikely to achieve Sustainable Development Goal (SDG) 8 within the designated period. However, the African Union Agenda, which prioritises inclusive growth and regional integration, could be achieved by 2045, ceteris paribus. These findings imply that policymakers should intensify efforts to attract more foreign direct investment, as its threshold of 3.44% indicates a greater potential for enhancing convergence relative to ODA and remittances. Therefore, to maximise the benefits of economic integration, which amplifies the impact of financial flow components on macroeconomic convergence, the implementation of comprehensive structural reforms aimed at diversifying intra-African trade away from primary products is crucial.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"13 1","pages":"191 - 214"},"PeriodicalIF":0.0,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147341399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liu Hao, Yu Zhi, Si Cheng-wei, Yuan An-feng, Liu Da-lian
{"title":"Machine Learning-Based Analysis of College Student Performance and Examination Optimization","authors":"Liu Hao, Yu Zhi, Si Cheng-wei, Yuan An-feng, Liu Da-lian","doi":"10.1007/s40745-025-00629-5","DOIUrl":"10.1007/s40745-025-00629-5","url":null,"abstract":"<div><p>This paper conducts a thorough and rigorous analysis of student performance using machine learning algorithms and proposes corresponding strategies for optimizing examinations. The research is structured into three key aspects: first, the application of nonlinear models, such as Random Forest and Gradient Boosting Trees, to identify critical factors influencing student performance and provide targeted learning recommendations; second, the use of clustering analysis to stratify students and assess the difficulty coefficients of every question in the math paper, thereby optimizing exam difficulty; and third, the employment of linear regression models to predict student performance and validate correlations among different mathematics subjects. Additionally, this study incorporates Generative Adversarial Networks (GANs) to enhance model optimization, further improving their generalization capabilities and prediction accuracy. The findings reveal that midterm performance is a pivotal indicator for final grade warnings and that there is a significant correlation within mathematics subjects. Furthermore, exam question difficulty should be aligned with students’ learning levels. These results provide a scientific foundation for enhancing teaching quality, improving student learning efficiency, and offering valuable insights for teachers in implementing personalized teaching and optimizing examinations.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"13 1","pages":"143 - 161"},"PeriodicalIF":0.0,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147340768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SML-AAEA: A Systematic Method for Evaluating Advanced Activity of Daily Living Scale","authors":"Qinyan Wei, Aihua Li, Yuxue Chi, Xinzhu Xing","doi":"10.1007/s40745-025-00623-x","DOIUrl":"10.1007/s40745-025-00623-x","url":null,"abstract":"<div><p>With the intensification of global population aging, the incidence of cognitive impairment such as dementia continues to rise. The Activity of Daily Living (ADL) scale can help to assess daily living functions and early screen for dementia, which is crucial for delaying disease progression and improving the quality of life in older adults. As advanced ADL assessment tools continue to be developed and improved, how to evaluate their effectiveness is particularly important. However, most studies have assessed these tools from a single perspective and have often failed to examine the contribution of individual items within the scales. Therefore, we propose the Statistical and Machine Learning-based Advanced ADL scale Effectiveness Assessment Method (SML-AAEA) to evaluate the psychometric properties and early dementia screening ability of ADL assessment tool, including: (1) scale design and data collection based on traditional scales and advanced items; (2) analysis of scale validity and reliability; and (3) analysis of scale dementia diagnostic ability and item importance using machine learning. We then apply SML-AAEA to investigate the effectiveness of our proposed Advanced ADL scale for Early Dementia Screening (AADLs-EDS), which introduces three new advanced items, namely “Going far away”, “Online shopping” and “Using smartphone”. The results show that AADLs-EDS has excellent construct validity, measurement invariance, and scale reliability. The total score of AADLs-EDS can explain the changes in elderly cognitive functions to some extent. The study also finds that AADLs-EDS outperforms the traditional ADL scale in classifying dementia, with the three new items showing the strongest predictive contributions. The findings confirm that AADLs-EDS is a reliable and valid tool for early dementia screening.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 6","pages":"1983 - 2008"},"PeriodicalIF":0.0,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Transformation Tree Based on Extension Set Theory","authors":"Xingsen Li, Junwen Sun, Jiasheng Li, Yiqing Yan","doi":"10.1007/s40745-025-00605-z","DOIUrl":"10.1007/s40745-025-00605-z","url":null,"abstract":"<div><p>In the current era of information abundance, data mining and strategy generation have become crucial. Although decision trees are widely used across various domains, their direct application to obtain optimal solutions often lacks the flexibility and precision needed to adapt update themselves under specific conditions. This paper introduces an enhanced data mining algorithm, the Transformation Tree, which addresses these limitations by systematically comparing parameter variables under diverse conditions. The transformation tree algorithm builds on the traditional decision tree methodology to identify optimal transformation schemes, delivering precise and adaptable solutions. The algorithm employs basic-element theory to extract data features, evaluates them using information gain ratio and the comprehensive gain ratio metrics, and constructs a transformation tree structure. This structure facilitates the generation and refinement of transformation schemes and strategies through iterative testing. To validate its effectiveness, we applied the algorithm to a hypertension case study. The results indicate that while the transformation tree algorithm exhibits slightly lower efficiency in classification tasks, it excels in discovering multiple transformation strategies, enabling intelligent scheme generation, and providing flexible and accurate decision support.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 6","pages":"1965 - 1981"},"PeriodicalIF":0.0,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145537722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Stock Prices Forecasting by Using a Novel Hybrid Method Based on the MFO-Optimized GRU Network","authors":"Xinjian Zhang, Guanlin Liu","doi":"10.1007/s40745-025-00616-w","DOIUrl":"10.1007/s40745-025-00616-w","url":null,"abstract":"<div><p>With the social economy growing at a quick pace and the stock market seeing constant developments, more and more people are voicing concerns about investing in stocks. The importance of forecasting stock values has increased in the domain of engineering's use of cognitive computing. Utilizing data-driven tactics for forecasting stock prices, investors can effectively mitigate risks and enhance profits. Investors can use projections based on historical values and textual data to make well-informed judgments about future patterns in stock prices. Stock price anticipation is a pivotal undertaking in the financial sector that has substantial consequences for traders and investors. This article presents an in-depth comparison analysis of machine learning tactics for forecasting price fluctuations in stocks. The research deploys historical stock data and diverse technical indicators. This paper presents the Gated Recurrent Unit (GRU) model for Nasdaq stock index anticipation, which is optimized by Particle swarm optimization (PSO), Biogeography-based optimization (BBO), and Moth flame optimization (MFO). Among these optimizers, MFO has the best outcomes. Compared to the GRU scheme the optimized PSO-GRU, BBO-GRU, and MFO-optimized GRU for stock forecasting has the outcomes of 0.9807, 0.9824, and 0.9904 in coefficient of determination (<span>({R}^{2})</span>) which shows the improvement of the presented scheme as a result of its development. The criteria used to evaluate this model are mean absolute error, root mean absolute error, and <span>({R}^{2})</span>.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1369 - 1387"},"PeriodicalIF":0.0,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145166219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization of Oil and Gas Pipeline Leakage Data and Defect Identification Based on Graph Neural Processing","authors":"Lizhen Zhang","doi":"10.1007/s40745-025-00619-7","DOIUrl":"10.1007/s40745-025-00619-7","url":null,"abstract":"<div><p>With the increasing complexity of oil and gas pipeline networks, early identification of leaks and defects is crucial to ensure the safe operation of pipelines. This study proposes a graph neural network (GNN) method for data processing and defect identification aimed at optimizing monitoring and maintenance strategies for oil and gas pipelines. Through the analysis of historical leakage data, we constructed a graph database containing 5000 samples, each containing 10 features such as pressure, flow, temperature, etc. Using graph convolutional network and graph attention network (GAT) to perform feature extraction and pattern recognition on nodes in pipeline network, our model achieves 92% accuracy in defect recognition, which is 15% higher than traditional methods. In addition, we have developed a leakage prediction model based on time series analysis, which is able to predict potential leakage risks 24 h in advance with an accuracy of 85%. The results of this study not only improve the safety management level of oil and gas pipelines, but also provide a new technical path for future intelligent pipeline maintenance.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1413 - 1430"},"PeriodicalIF":0.0,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40745-025-00619-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145165694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Quadri-Partitioned Interval-Valued Pythagorean Neutrosophic Fuzzy MCDM","authors":"Manajit Roy, Bhimraj Basumatary, Binod Chandra Tripathy","doi":"10.1007/s40745-025-00621-z","DOIUrl":"10.1007/s40745-025-00621-z","url":null,"abstract":"<div><p>We present two methods for solving multicriteria fuzzy decision-making based on a Quadri partitioned interval-valued Pythagorean neutrosophic set. Firstly, we deduce the alternatives with different weights using averaging and geometric operators, and then we use the accuracy function or score function for choosing the optimal solution. Finally, a practical example is provided to illustrate the practicality and effectiveness of the proposed approach.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"13 1","pages":"105 - 123"},"PeriodicalIF":0.0,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147337795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Predictive Accuracy in Writing Assessment Through Advanced Machine Learning Techniques","authors":"Xiao Zhang","doi":"10.1007/s40745-025-00618-8","DOIUrl":"10.1007/s40745-025-00618-8","url":null,"abstract":"<div><p>This research investigates the application of the Machine Learning (ML) model for effective and equitable essay scoring in education. Unlike their human counterpart, ML models have the capacity to rapidly analyze scores of essays, providing timely and equitable scores that take into account varying student demographics and styles of writing. This function helps in the identification of classroom problems and supports the design of focused teaching methodologies. For the study, a Light Gradient Boosting Classification (LGBC) model was optimized by three optimizers: Black Widow Optimization (BWO), Zebra Optimization Algorithm (ZOA), and Leader Harris Hawks Optimization (LHHO), for the development of the hybrid models with a focus on improved prediction quality. Comparison of these hybrid models with the base LGBC model was performed through different phases, such as Training, Validation, and Testing. The findings show that the LGLH model exhibited improved performance with an accuracy rate of 0.981, followed by the LGZO model with 0.971 and the LGBW model with 0.963. The lowest rate of accuracy was observed in the base LGBC model, which was 0.946. The results demonstrate the efficacy of hybrid models, which harness the optimality of several optimization techniques and provide more robust results for complicated tasks. The study emphasizes the importance of selecting the appropriate model architecture to achieve optimal performance, providing valuable insights into model efficacy at various stages of evaluation.</p></div>","PeriodicalId":36280,"journal":{"name":"Annals of Data Science","volume":"12 4","pages":"1389 - 1412"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145161714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}