Three-way decisions with text data and its application in market regulation

IF 6.9 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-08-13 DOI:10.1016/j.ipm.2025.104307

Tengbiao Li , Junsheng Qiao , Guomin Chao

{"title":"Three-way decisions with text data and its application in market regulation","authors":"Tengbiao Li , Junsheng Qiao , Guomin Chao","doi":"10.1016/j.ipm.2025.104307","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of artificial intelligence, text classification of comment letters (CLs) data can help the stock exchanges predict the CLs’ response statuses and enhance market regulatory efficiency. In this paper, a novel three-way classifier based on word embedding and aggregation function (AF), called WE-AF-TWC, is proposed for multi-label classification using textual information from annual CLs of Chinese quoted companies. Firstly, we introduce the classical Word2Vec to extract mathematical values from the potential information in each text. Subsequently, we develop a novel AF with idempotent property as a mathematical tool to stably fuse multiple word vectors and generate a fuzzy relation matrix, and we discuss several properties of it. Lastly, we introduce a parameterized construction form to construct the three-way decision space, which formally simulate human decision logic, and further improve classification performance by training the optimal decision region. Particularly, the performance of WE-AF-TWC is validated on a manually curated Chinese market regulation CL response dataset, called CNletters, containing 5,727 records, as well as on three commonly used public datasets. The experimental results show that WE-AF-TWC not only demonstrates accuracy and robustness in the CL text classification task, but also exhibits superior performance in multi-scenario applications. Specifically, on CNletters, the weighted-precision of WE-AF-TWC is 82.76%, which is better than several most advanced classifiers. On SciCite, compared with three advanced classifiers, the weighted-precision obtained by WE-AF-TWC shows an improvement of 6.60%, 0.50% and 1.80%, respectively. Similarly, on AGNews and PubMed 200k RCT, the corresponding improvement is 1.95%, 1.02%, 0.03% and 8.93%, 0.34%, 1.56%, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104307"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002481","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the era of artificial intelligence, text classification of comment letters (CLs) data can help the stock exchanges predict the CLs’ response statuses and enhance market regulatory efficiency. In this paper, a novel three-way classifier based on word embedding and aggregation function (AF), called WE-AF-TWC, is proposed for multi-label classification using textual information from annual CLs of Chinese quoted companies. Firstly, we introduce the classical Word2Vec to extract mathematical values from the potential information in each text. Subsequently, we develop a novel AF with idempotent property as a mathematical tool to stably fuse multiple word vectors and generate a fuzzy relation matrix, and we discuss several properties of it. Lastly, we introduce a parameterized construction form to construct the three-way decision space, which formally simulate human decision logic, and further improve classification performance by training the optimal decision region. Particularly, the performance of WE-AF-TWC is validated on a manually curated Chinese market regulation CL response dataset, called CNletters, containing 5,727 records, as well as on three commonly used public datasets. The experimental results show that WE-AF-TWC not only demonstrates accuracy and robustness in the CL text classification task, but also exhibits superior performance in multi-scenario applications. Specifically, on CNletters, the weighted-precision of WE-AF-TWC is 82.76%, which is better than several most advanced classifiers. On SciCite, compared with three advanced classifiers, the weighted-precision obtained by WE-AF-TWC shows an improvement of 6.60%, 0.50% and 1.80%, respectively. Similarly, on AGNews and PubMed 200k RCT, the corresponding improvement is 1.95%, 1.02%, 0.03% and 8.93%, 0.34%, 1.56%, respectively.

查看原文本刊更多论文

基于文本数据的三方决策及其在市场监管中的应用

在人工智能时代，对意见书数据进行文本分类，可以帮助证券交易所预测意见书的回复状态，提高市场监管效率。本文提出了一种基于词嵌入和聚合函数（AF）的三向分类器WE-AF-TWC，用于利用中国上市公司年报文本信息进行多标签分类。首先，我们引入经典的Word2Vec算法，从每个文本的潜在信息中提取数学值。随后，我们开发了一种具有幂等性质的AF作为稳定融合多词向量和生成模糊关系矩阵的数学工具，并讨论了它的几个性质。最后，引入参数化构造形式来构造三向决策空间，形式化地模拟人类决策逻辑，并通过训练最优决策区域进一步提高分类性能。特别是，WE-AF-TWC的性能在人工管理的中国市场监管CL响应数据集（称为CNletters）上进行了验证，该数据集包含5,727条记录，以及三个常用的公共数据集。实验结果表明，WE-AF-TWC不仅在CL文本分类任务中表现出准确性和鲁棒性，而且在多场景应用中也表现出优异的性能。具体而言，在cn字母上，WE-AF-TWC的加权精度为82.76%，优于几种最先进的分类器。在sciite上，与三种高级分类器相比，WE-AF-TWC的加权精度分别提高了6.60%、0.50%和1.80%。同样，在AGNews和PubMed 200k RCT上，相应的改进分别为1.95%、1.02%、0.03%和8.93%、0.34%、1.56%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.