{"title":"使用扩展集成学习和优化火花流的金融大数据分析混合方法","authors":"Muhammad Babar","doi":"10.1016/j.joitmc.2025.100602","DOIUrl":null,"url":null,"abstract":"<div><div>The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.</div></div>","PeriodicalId":16678,"journal":{"name":"Journal of Open Innovation: Technology, Market, and Complexity","volume":"11 3","pages":"Article 100602"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming\",\"authors\":\"Muhammad Babar\",\"doi\":\"10.1016/j.joitmc.2025.100602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.</div></div>\",\"PeriodicalId\":16678,\"journal\":{\"name\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"volume\":\"11 3\",\"pages\":\"Article 100602\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2199853125001374\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Open Innovation: Technology, Market, and Complexity","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2199853125001374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.