{"title":"使用标签感知双图注意力网络对旅游资源进行分层多标签文本分类","authors":"Quan Cheng, Wenwan Shi","doi":"10.1016/j.ipm.2024.103952","DOIUrl":null,"url":null,"abstract":"<div><div>In the era of big data, classifying online tourism resource information can facilitate the matching of user needs with tourism resources and enhance the efficiency of tourism resource integration. However, most research in this field has concentrated on a simple classification problem with a single level of single labelling. In this paper, a Hierarchical Label-Aware Tourism-Informed Dual Graph Attention Network (HLT-DGAT) is proposed for the complex multi-level and multi-label classification presented by online textual information about Chinese tourism resources. This model integrates domain knowledge into a pre-trained language model and employs attention mechanisms to transform the text representation into the label-based representation. Subsequently, the model utilizes dual Graph Attention Network (GAT), with one component capturing vertical information and the other capturing horizontal information within the label hierarchy. The model's performance is validated on two commonly used public datasets as well as on a manually curated Chinese tourism resource dataset, which consists of online textual overviews of Chinese tourism resources above 3A level. Experimental results indicate that HLT-DGAT demonstrates superiority in threshold-based and area-under-curve evaluation metrics. Specifically, the <span><math><mrow><mrow><mtext>AU</mtext><mo>(</mo></mrow><mover><mrow><mtext>PRC</mtext></mrow><mo>‾</mo></mover><mrow><mo>)</mo></mrow></mrow></math></span> reaches 64.5 % on the Chinese tourism resource dataset with enforced leaf nodes, which is 3 % higher than the optimal corresponding metric of the baseline model. Furthermore, ablation studies show that (1) integrating domain knowledge, (2) combining local information, (3) considering label dependencies within the same level of label hierarchy, and (4) merging dynamic reconstruction can enhance overall model performance.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":7.4000,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical multi-label text classification of tourism resources using a label-aware dual graph attention network\",\"authors\":\"Quan Cheng, Wenwan Shi\",\"doi\":\"10.1016/j.ipm.2024.103952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the era of big data, classifying online tourism resource information can facilitate the matching of user needs with tourism resources and enhance the efficiency of tourism resource integration. However, most research in this field has concentrated on a simple classification problem with a single level of single labelling. In this paper, a Hierarchical Label-Aware Tourism-Informed Dual Graph Attention Network (HLT-DGAT) is proposed for the complex multi-level and multi-label classification presented by online textual information about Chinese tourism resources. This model integrates domain knowledge into a pre-trained language model and employs attention mechanisms to transform the text representation into the label-based representation. Subsequently, the model utilizes dual Graph Attention Network (GAT), with one component capturing vertical information and the other capturing horizontal information within the label hierarchy. The model's performance is validated on two commonly used public datasets as well as on a manually curated Chinese tourism resource dataset, which consists of online textual overviews of Chinese tourism resources above 3A level. Experimental results indicate that HLT-DGAT demonstrates superiority in threshold-based and area-under-curve evaluation metrics. Specifically, the <span><math><mrow><mrow><mtext>AU</mtext><mo>(</mo></mrow><mover><mrow><mtext>PRC</mtext></mrow><mo>‾</mo></mover><mrow><mo>)</mo></mrow></mrow></math></span> reaches 64.5 % on the Chinese tourism resource dataset with enforced leaf nodes, which is 3 % higher than the optimal corresponding metric of the baseline model. Furthermore, ablation studies show that (1) integrating domain knowledge, (2) combining local information, (3) considering label dependencies within the same level of label hierarchy, and (4) merging dynamic reconstruction can enhance overall model performance.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S030645732400311X\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030645732400311X","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Hierarchical multi-label text classification of tourism resources using a label-aware dual graph attention network
In the era of big data, classifying online tourism resource information can facilitate the matching of user needs with tourism resources and enhance the efficiency of tourism resource integration. However, most research in this field has concentrated on a simple classification problem with a single level of single labelling. In this paper, a Hierarchical Label-Aware Tourism-Informed Dual Graph Attention Network (HLT-DGAT) is proposed for the complex multi-level and multi-label classification presented by online textual information about Chinese tourism resources. This model integrates domain knowledge into a pre-trained language model and employs attention mechanisms to transform the text representation into the label-based representation. Subsequently, the model utilizes dual Graph Attention Network (GAT), with one component capturing vertical information and the other capturing horizontal information within the label hierarchy. The model's performance is validated on two commonly used public datasets as well as on a manually curated Chinese tourism resource dataset, which consists of online textual overviews of Chinese tourism resources above 3A level. Experimental results indicate that HLT-DGAT demonstrates superiority in threshold-based and area-under-curve evaluation metrics. Specifically, the reaches 64.5 % on the Chinese tourism resource dataset with enforced leaf nodes, which is 3 % higher than the optimal corresponding metric of the baseline model. Furthermore, ablation studies show that (1) integrating domain knowledge, (2) combining local information, (3) considering label dependencies within the same level of label hierarchy, and (4) merging dynamic reconstruction can enhance overall model performance.
期刊介绍:
Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing.
We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.