Incorporating metadata: A novel variational neural topic model for bond default prediction

IF 8.1 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-04-22 DOI:10.1016/j.ins.2025.122219

Wang Lu , Cuiqing Jiang , Zhao Wang , Yong Ding , Xiaoya Ni

{"title":"Incorporating metadata: A novel variational neural topic model for bond default prediction","authors":"Wang Lu , Cuiqing Jiang , Zhao Wang , Yong Ding , Xiaoya Ni","doi":"10.1016/j.ins.2025.122219","DOIUrl":null,"url":null,"abstract":"<div><div>Credit rating reports, especially the analyst rating opinions with positive and negative tendencies, offer key insights into the risks influencing bond issuers' ability to repay debt, making them vital for bond default prediction. However, meaningful topics are challenging to extract from these texts. Positive and negative rating opinions provide metadata that could guide topic modeling to capture both strengths and risks, but existing methods struggle to reflect these nuances. Additionally, the brevity of opinions leads to sparse data, which complicates topic extraction. To address these issues, MetaNTM is proposed as a metadata-guided variational neural topic model, which features a variational topic distribution module to manage sparsity and a metadata-guided attention mechanism to integrate rating tendencies. Experiments on Chinese corporate bond data show that MetaNTM outperforms benchmark topic models, achieving higher topic coherence. The topic features also enhance predictive accuracy across multiple bond default prediction models. Logistic regression achieves a 42.0% improvement in H-measure (from 0.355 to 0.504), random forest records a 12.9% increase in KS (from 0.605 to 0.683) and a 22.2% rise in H-measure (from 0.414 to 0.506), and XGBoost achieves a 7.2% increase in H-measure (from 0.581 to 0.623). These results underscore the predictive power of MetaNTM's topic features.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122219"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003512","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Credit rating reports, especially the analyst rating opinions with positive and negative tendencies, offer key insights into the risks influencing bond issuers' ability to repay debt, making them vital for bond default prediction. However, meaningful topics are challenging to extract from these texts. Positive and negative rating opinions provide metadata that could guide topic modeling to capture both strengths and risks, but existing methods struggle to reflect these nuances. Additionally, the brevity of opinions leads to sparse data, which complicates topic extraction. To address these issues, MetaNTM is proposed as a metadata-guided variational neural topic model, which features a variational topic distribution module to manage sparsity and a metadata-guided attention mechanism to integrate rating tendencies. Experiments on Chinese corporate bond data show that MetaNTM outperforms benchmark topic models, achieving higher topic coherence. The topic features also enhance predictive accuracy across multiple bond default prediction models. Logistic regression achieves a 42.0% improvement in H-measure (from 0.355 to 0.504), random forest records a 12.9% increase in KS (from 0.605 to 0.683) and a 22.2% rise in H-measure (from 0.414 to 0.506), and XGBoost achieves a 7.2% increase in H-measure (from 0.581 to 0.623). These results underscore the predictive power of MetaNTM's topic features.

查看原文本刊更多论文

结合元数据：一种新的债券违约预测变分神经主题模型

信用评级报告，特别是具有积极和消极倾向的分析师评级意见，是了解影响债券发行人偿债能力的风险的关键，对债券违约预测至关重要。然而，从这些文本中提取有意义的主题是一项挑战。正面和负面的评价意见提供了元数据，可以指导主题建模来捕捉优势和风险，但是现有的方法很难反映这些细微差别。此外，观点的简洁导致数据稀疏，这使得主题提取变得复杂。为了解决这些问题，MetaNTM被提出作为一个元数据引导的变分神经主题模型，该模型具有变分主题分布模块来管理稀疏性和元数据引导的注意力机制来整合评级倾向。对中国公司债券数据的实验表明，MetaNTM优于基准主题模型，实现了更高的主题一致性。该主题的特点还提高了跨多个债券违约预测模型的预测准确性。Logistic回归实现了42.0%的H-measure提升（从0.355到0.504），随机森林记录了12.9%的KS提升（从0.605到0.683）和22.2%的H-measure提升（从0.414到0.506），XGBoost实现了7.2%的H-measure提升（从0.581到0.623）。这些结果强调了MetaNTM主题特征的预测能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.