将多文本因素纳入非均衡财务困境预测:特征选择方法与集成分类器相结合的方法

IF 2.9 4区 计算机科学
Shixuan Li, Wenxuan Shi
{"title":"将多文本因素纳入非均衡财务困境预测:特征选择方法与集成分类器相结合的方法","authors":"Shixuan Li, Wenxuan Shi","doi":"10.1007/s44196-023-00342-2","DOIUrl":null,"url":null,"abstract":"Abstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companies’ financial distress prediction (FDP). A unique paradigm is proposed in this study that combines financial and multi-type textual predictive factors, feature selection methods, classifiers, and time spans to achieve the optimal FDP. The frequency counts, TF-IDF, TextRank, and word embedding approaches are employed to extract frequency count-based, keyword-based, sentiment, and readability indicators. The experimental results prove that financial domain sentiment lexicons, word embedding-based readability analysis approaches, and the basic textual features of Management Discussion and Analysis can be important elements of FDP. Moreover, the finding highlights the fact that incorporating financial and textual features can achieve optimal performance 4 or 5 years before the expected baseline year; applying the RF-GBDT combined model can also outperform other classifiers. This study makes an innovative contribution, since it expands the multiple text analysis method in the financial text mining field and provides new findings on how to provide early warning signs related to financial risk. The approaches developed in this research can serve as a template that can be used to resolve other financial issues.","PeriodicalId":54967,"journal":{"name":"International Journal of Computational Intelligence Systems","volume":"49 1","pages":"0"},"PeriodicalIF":2.9000,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach\",\"authors\":\"Shixuan Li, Wenxuan Shi\",\"doi\":\"10.1007/s44196-023-00342-2\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companies’ financial distress prediction (FDP). A unique paradigm is proposed in this study that combines financial and multi-type textual predictive factors, feature selection methods, classifiers, and time spans to achieve the optimal FDP. The frequency counts, TF-IDF, TextRank, and word embedding approaches are employed to extract frequency count-based, keyword-based, sentiment, and readability indicators. The experimental results prove that financial domain sentiment lexicons, word embedding-based readability analysis approaches, and the basic textual features of Management Discussion and Analysis can be important elements of FDP. Moreover, the finding highlights the fact that incorporating financial and textual features can achieve optimal performance 4 or 5 years before the expected baseline year; applying the RF-GBDT combined model can also outperform other classifiers. This study makes an innovative contribution, since it expands the multiple text analysis method in the financial text mining field and provides new findings on how to provide early warning signs related to financial risk. The approaches developed in this research can serve as a template that can be used to resolve other financial issues.\",\"PeriodicalId\":54967,\"journal\":{\"name\":\"International Journal of Computational Intelligence Systems\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-10-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computational Intelligence Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s44196-023-00342-2\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computational Intelligence Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44196-023-00342-2","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

摘要基于文本的因子已被广泛认为是一种有前途的特征,可以应用于金融问题。本研究的重点是提取基本文本特征和语义文本特征,以补充传统的财务指标。主要是为了完善我国上市公司财务困境预测(FDP)。本研究提出了一个独特的范例,结合金融和多类型文本预测因素、特征选择方法、分类器和时间跨度来实现最优FDP。使用频率计数、TF-IDF、TextRank和词嵌入方法提取基于频率计数、基于关键字、情感和可读性指标。实验结果表明,金融领域情感词汇、基于词嵌入的可读性分析方法和《管理讨论与分析》的基本文本特征可以作为FDP的重要组成部分。此外,研究结果强调,将财务和文本特征结合起来可以在预期基准年之前4或5年实现最佳绩效;应用RF-GBDT组合模型也可以优于其他分类器。本研究的创新性贡献在于拓展了金融文本挖掘领域的多文本分析方法,在如何提供与金融风险相关的预警信号方面提供了新的发现。本研究中开发的方法可以作为解决其他财务问题的模板。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Incorporating Multiple Textual Factors into Unbalanced Financial Distress Prediction: A Feature Selection Methods and Ensemble Classifiers Combined Approach
Abstract Textual-based factors have been widely regarded as a promising feature that can be applied to financial issues. This study focuses on extracting both basic and semantic textual features to supplement the traditionally used financial indicators. The main is to improve Chinese listed companies’ financial distress prediction (FDP). A unique paradigm is proposed in this study that combines financial and multi-type textual predictive factors, feature selection methods, classifiers, and time spans to achieve the optimal FDP. The frequency counts, TF-IDF, TextRank, and word embedding approaches are employed to extract frequency count-based, keyword-based, sentiment, and readability indicators. The experimental results prove that financial domain sentiment lexicons, word embedding-based readability analysis approaches, and the basic textual features of Management Discussion and Analysis can be important elements of FDP. Moreover, the finding highlights the fact that incorporating financial and textual features can achieve optimal performance 4 or 5 years before the expected baseline year; applying the RF-GBDT combined model can also outperform other classifiers. This study makes an innovative contribution, since it expands the multiple text analysis method in the financial text mining field and provides new findings on how to provide early warning signs related to financial risk. The approaches developed in this research can serve as a template that can be used to resolve other financial issues.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Computational Intelligence Systems
International Journal of Computational Intelligence Systems 工程技术-计算机:跨学科应用
自引率
3.40%
发文量
94
期刊介绍: The International Journal of Computational Intelligence Systems publishes original research on all aspects of applied computational intelligence, especially targeting papers demonstrating the use of techniques and methods originating from computational intelligence theory. The core theories of computational intelligence are fuzzy logic, neural networks, evolutionary computation and probabilistic reasoning. The journal publishes only articles related to the use of computational intelligence and broadly covers the following topics: -Autonomous reasoning- Bio-informatics- Cloud computing- Condition monitoring- Data science- Data mining- Data visualization- Decision support systems- Fault diagnosis- Intelligent information retrieval- Human-machine interaction and interfaces- Image processing- Internet and networks- Noise analysis- Pattern recognition- Prediction systems- Power (nuclear) safety systems- Process and system control- Real-time systems- Risk analysis and safety-related issues- Robotics- Signal and image processing- IoT and smart environments- Systems integration- System control- System modelling and optimization- Telecommunications- Time series prediction- Warning systems- Virtual reality- Web intelligence- Deep learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信