Ashwini Dalvi , Shriya Pingulkar , Aryaman Tiwary , Diti Divekar , Irfan N A Siddavatam , Nilkamal More
{"title":"Integration of Coalescent Theory and Generative Adversarial Network (GAN) for Synthesizing High-Fidelity Textual Financial Data","authors":"Ashwini Dalvi , Shriya Pingulkar , Aryaman Tiwary , Diti Divekar , Irfan N A Siddavatam , Nilkamal More","doi":"10.1016/j.procs.2025.01.019","DOIUrl":null,"url":null,"abstract":"<div><div>Financial data analysis faces significant challenges due to limitations in the quality, scope, and biases of existing datasets. This research work introduces a novel approach to creating synthetic financial datasets using coalescent theory, a principle from evolutionary biology, combined with deep learning methodologies to address constraints on scope, accessibility, and diversity in financial datasets. While methods such as the Synthetic Minority Over-sampling Technique (SMOTE) and Generative Adversarial Networks (GANs) have shown some success in generating synthetic data, particularly in textual domains, they still face significant challenges in producing realistic and balanced textual data. The proposed method in this research improves the stability and quality of synthetic data generation by integrating coalescent theory with GANs, resulting in a more stable architecture that mitigates mode collapse and captures complex temporal dependencies and non-linear relationships in financial datasets. The generated data accurately mirrors the intricacies of real-world financial markets, enhancing the quality, diversity, and authenticity of synthetic data for robust predictive modelling. This research works details the integration of evolutionary algorithms with deep learning to create datasets that authentically represent financial contexts and are nearly indistinguishable from genuine data. By introducing this interdisciplinary approach, this research aims to enrich the toolkit for financial analysis and set a new standard in synthetic data generation.</div></div>","PeriodicalId":20465,"journal":{"name":"Procedia Computer Science","volume":"252 ","pages":"Pages 593-602"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Procedia Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1877050925000195","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Financial data analysis faces significant challenges due to limitations in the quality, scope, and biases of existing datasets. This research work introduces a novel approach to creating synthetic financial datasets using coalescent theory, a principle from evolutionary biology, combined with deep learning methodologies to address constraints on scope, accessibility, and diversity in financial datasets. While methods such as the Synthetic Minority Over-sampling Technique (SMOTE) and Generative Adversarial Networks (GANs) have shown some success in generating synthetic data, particularly in textual domains, they still face significant challenges in producing realistic and balanced textual data. The proposed method in this research improves the stability and quality of synthetic data generation by integrating coalescent theory with GANs, resulting in a more stable architecture that mitigates mode collapse and captures complex temporal dependencies and non-linear relationships in financial datasets. The generated data accurately mirrors the intricacies of real-world financial markets, enhancing the quality, diversity, and authenticity of synthetic data for robust predictive modelling. This research works details the integration of evolutionary algorithms with deep learning to create datasets that authentically represent financial contexts and are nearly indistinguishable from genuine data. By introducing this interdisciplinary approach, this research aims to enrich the toolkit for financial analysis and set a new standard in synthetic data generation.