{"title":"Three-way decisions with text data and its application in market regulation","authors":"Tengbiao Li , Junsheng Qiao , Guomin Chao","doi":"10.1016/j.ipm.2025.104307","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of artificial intelligence, text classification of comment letters (CLs) data can help the stock exchanges predict the CLs’ response statuses and enhance market regulatory efficiency. In this paper, a novel three-way classifier based on word embedding and aggregation function (AF), called WE-AF-TWC, is proposed for multi-label classification using textual information from annual CLs of Chinese quoted companies. Firstly, we introduce the classical Word2Vec to extract mathematical values from the potential information in each text. Subsequently, we develop a novel AF with idempotent property as a mathematical tool to stably fuse multiple word vectors and generate a fuzzy relation matrix, and we discuss several properties of it. Lastly, we introduce a parameterized construction form to construct the three-way decision space, which formally simulate human decision logic, and further improve classification performance by training the optimal decision region. Particularly, the performance of WE-AF-TWC is validated on a manually curated Chinese market regulation CL response dataset, called CNletters, containing 5,727 records, as well as on three commonly used public datasets. The experimental results show that WE-AF-TWC not only demonstrates accuracy and robustness in the CL text classification task, but also exhibits superior performance in multi-scenario applications. Specifically, on CNletters, the weighted-precision of WE-AF-TWC is 82.76%, which is better than several most advanced classifiers. On SciCite, compared with three advanced classifiers, the weighted-precision obtained by WE-AF-TWC shows an improvement of 6.60%, 0.50% and 1.80%, respectively. Similarly, on AGNews and PubMed 200k RCT, the corresponding improvement is 1.95%, 1.02%, 0.03% and 8.93%, 0.34%, 1.56%, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 1","pages":"Article 104307"},"PeriodicalIF":6.9000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325002481","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the era of artificial intelligence, text classification of comment letters (CLs) data can help the stock exchanges predict the CLs’ response statuses and enhance market regulatory efficiency. In this paper, a novel three-way classifier based on word embedding and aggregation function (AF), called WE-AF-TWC, is proposed for multi-label classification using textual information from annual CLs of Chinese quoted companies. Firstly, we introduce the classical Word2Vec to extract mathematical values from the potential information in each text. Subsequently, we develop a novel AF with idempotent property as a mathematical tool to stably fuse multiple word vectors and generate a fuzzy relation matrix, and we discuss several properties of it. Lastly, we introduce a parameterized construction form to construct the three-way decision space, which formally simulate human decision logic, and further improve classification performance by training the optimal decision region. Particularly, the performance of WE-AF-TWC is validated on a manually curated Chinese market regulation CL response dataset, called CNletters, containing 5,727 records, as well as on three commonly used public datasets. The experimental results show that WE-AF-TWC not only demonstrates accuracy and robustness in the CL text classification task, but also exhibits superior performance in multi-scenario applications. Specifically, on CNletters, the weighted-precision of WE-AF-TWC is 82.76%, which is better than several most advanced classifiers. On SciCite, compared with three advanced classifiers, the weighted-precision obtained by WE-AF-TWC shows an improvement of 6.60%, 0.50% and 1.80%, respectively. Similarly, on AGNews and PubMed 200k RCT, the corresponding improvement is 1.95%, 1.02%, 0.03% and 8.93%, 0.34%, 1.56%, respectively.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.