使用扩展集成学习和优化火花流的金融大数据分析混合方法

Q1 Economics, Econometrics and Finance
Muhammad Babar
{"title":"使用扩展集成学习和优化火花流的金融大数据分析混合方法","authors":"Muhammad Babar","doi":"10.1016/j.joitmc.2025.100602","DOIUrl":null,"url":null,"abstract":"<div><div>The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.</div></div>","PeriodicalId":16678,"journal":{"name":"Journal of Open Innovation: Technology, Market, and Complexity","volume":"11 3","pages":"Article 100602"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming\",\"authors\":\"Muhammad Babar\",\"doi\":\"10.1016/j.joitmc.2025.100602\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.</div></div>\",\"PeriodicalId\":16678,\"journal\":{\"name\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"volume\":\"11 3\",\"pages\":\"Article 100602\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Open Innovation: Technology, Market, and Complexity\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2199853125001374\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Economics, Econometrics and Finance\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Open Innovation: Technology, Market, and Complexity","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2199853125001374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Economics, Econometrics and Finance","Score":null,"Total":0}
引用次数: 0

摘要

金融部门在处理大量高速数据以支持智能、实时决策方面面临着越来越大的挑战。在处理大型动态金融数据集时,传统的机器学习模型往往在准确性、可扩展性和响应性方面存在不足。本研究提出了一种混合架构,将扩展的集成学习与基于Apache Spark Streaming的优化大数据处理管道集成在一起,以解决这些限制。该核心集成将k近邻(KNN)、支持向量机(SVM)和k近邻分类器(KNC)相结合,提高了分类的鲁棒性和泛化性。该系统是为分布式和并行执行而设计的,利用Spark的map-reduce功能来实现高吞吐量、低延迟的数据处理。使用葡萄牙银行营销数据集的经验评估表明,所提出的架构实现了90.9%的高预测精度,优于逻辑回归,支持向量机和随机森林等单个模型。该集成模型还报告了平均绝对误差(MAE)为0.023,均方误差(MSE)为0.0018。在系统性能方面,它每秒处理10,000条记录,平均延迟为150毫秒,内存使用保持在4GB左右,适合实时财务分析。所提出的体系结构显著提高了预测客户行为(如贷款订阅决策)的精度,并支持健壮的、可伸缩的财务决策。这项研究为将集成学习与金融科技中的大数据技术相结合,实现更准确、透明和高效的金融系统提供了宝贵的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A hybrid approach to financial big data analysis using extended ensemble learning and optimized spark streaming
The financial sector faces mounting challenges in processing vast volumes of high-velocity data to support intelligent, real-time decision-making. Traditional machine learning models often fall short in accuracy, scalability, and responsiveness when dealing with large, dynamic financial datasets. This study presents a hybrid architecture that integrates extended ensemble learning with an optimized big data processing pipeline based on Apache Spark Streaming to address these limitations. The core ensemble combines K-Nearest Neighbor (KNN), Support Vector Machine (SVM), and K-Neighbors Classifier (KNC) to improve classification robustness and generalization. The system is designed for distributed and parallel execution, leveraging Spark’s map-reduce capabilities for high-throughput, low-latency data handling. Empirical evaluations using the Portuguese Bank Marketing dataset demonstrate that the proposed architecture achieves a high prediction accuracy of 90.9%, outperforming individual models such as Logistic Regression, SVM, and Random Forest. The ensemble model also reports a mean absolute error (MAE) of 0.023 and a mean squared error (MSE) of 0.0018. Regarding system performance, it processes 10,000 records per second with an average latency of 150 ms and maintains memory usage around 4GB, making it suitable for real-time financial analytics. The proposed architecture significantly enhances precision in predicting client behaviors, such as loan subscription decisions, and supports robust, scalable financial decision-making. This research offers valuable insights for integrating ensemble learning with big data technologies in FinTech, enabling more accurate, transparent, and efficient financial systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Open Innovation: Technology, Market, and Complexity
Journal of Open Innovation: Technology, Market, and Complexity Economics, Econometrics and Finance-Economics, Econometrics and Finance (all)
CiteScore
11.00
自引率
0.00%
发文量
196
审稿时长
1 day
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信