Wang Lu , Cuiqing Jiang , Zhao Wang , Yong Ding , Xiaoya Ni
{"title":"结合元数据:一种新的债券违约预测变分神经主题模型","authors":"Wang Lu , Cuiqing Jiang , Zhao Wang , Yong Ding , Xiaoya Ni","doi":"10.1016/j.ins.2025.122219","DOIUrl":null,"url":null,"abstract":"<div><div>Credit rating reports, especially the analyst rating opinions with positive and negative tendencies, offer key insights into the risks influencing bond issuers' ability to repay debt, making them vital for bond default prediction. However, meaningful topics are challenging to extract from these texts. Positive and negative rating opinions provide metadata that could guide topic modeling to capture both strengths and risks, but existing methods struggle to reflect these nuances. Additionally, the brevity of opinions leads to sparse data, which complicates topic extraction. To address these issues, MetaNTM is proposed as a metadata-guided variational neural topic model, which features a variational topic distribution module to manage sparsity and a metadata-guided attention mechanism to integrate rating tendencies. Experiments on Chinese corporate bond data show that MetaNTM outperforms benchmark topic models, achieving higher topic coherence. The topic features also enhance predictive accuracy across multiple bond default prediction models. Logistic regression achieves a 42.0% improvement in H-measure (from 0.355 to 0.504), random forest records a 12.9% increase in KS (from 0.605 to 0.683) and a 22.2% rise in H-measure (from 0.414 to 0.506), and XGBoost achieves a 7.2% increase in H-measure (from 0.581 to 0.623). These results underscore the predictive power of MetaNTM's topic features.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"715 ","pages":"Article 122219"},"PeriodicalIF":8.1000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incorporating metadata: A novel variational neural topic model for bond default prediction\",\"authors\":\"Wang Lu , Cuiqing Jiang , Zhao Wang , Yong Ding , Xiaoya Ni\",\"doi\":\"10.1016/j.ins.2025.122219\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Credit rating reports, especially the analyst rating opinions with positive and negative tendencies, offer key insights into the risks influencing bond issuers' ability to repay debt, making them vital for bond default prediction. However, meaningful topics are challenging to extract from these texts. Positive and negative rating opinions provide metadata that could guide topic modeling to capture both strengths and risks, but existing methods struggle to reflect these nuances. Additionally, the brevity of opinions leads to sparse data, which complicates topic extraction. To address these issues, MetaNTM is proposed as a metadata-guided variational neural topic model, which features a variational topic distribution module to manage sparsity and a metadata-guided attention mechanism to integrate rating tendencies. Experiments on Chinese corporate bond data show that MetaNTM outperforms benchmark topic models, achieving higher topic coherence. The topic features also enhance predictive accuracy across multiple bond default prediction models. Logistic regression achieves a 42.0% improvement in H-measure (from 0.355 to 0.504), random forest records a 12.9% increase in KS (from 0.605 to 0.683) and a 22.2% rise in H-measure (from 0.414 to 0.506), and XGBoost achieves a 7.2% increase in H-measure (from 0.581 to 0.623). These results underscore the predictive power of MetaNTM's topic features.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"715 \",\"pages\":\"Article 122219\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525003512\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525003512","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Incorporating metadata: A novel variational neural topic model for bond default prediction
Credit rating reports, especially the analyst rating opinions with positive and negative tendencies, offer key insights into the risks influencing bond issuers' ability to repay debt, making them vital for bond default prediction. However, meaningful topics are challenging to extract from these texts. Positive and negative rating opinions provide metadata that could guide topic modeling to capture both strengths and risks, but existing methods struggle to reflect these nuances. Additionally, the brevity of opinions leads to sparse data, which complicates topic extraction. To address these issues, MetaNTM is proposed as a metadata-guided variational neural topic model, which features a variational topic distribution module to manage sparsity and a metadata-guided attention mechanism to integrate rating tendencies. Experiments on Chinese corporate bond data show that MetaNTM outperforms benchmark topic models, achieving higher topic coherence. The topic features also enhance predictive accuracy across multiple bond default prediction models. Logistic regression achieves a 42.0% improvement in H-measure (from 0.355 to 0.504), random forest records a 12.9% increase in KS (from 0.605 to 0.683) and a 22.2% rise in H-measure (from 0.414 to 0.506), and XGBoost achieves a 7.2% increase in H-measure (from 0.581 to 0.623). These results underscore the predictive power of MetaNTM's topic features.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.