{"title":"利用基于主题的对抗神经网络进行跨域关键词提取","authors":"Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong","doi":"10.1109/ICDM.2018.00075","DOIUrl":null,"url":null,"abstract":"Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction\",\"authors\":\"Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong\",\"doi\":\"10.1109/ICDM.2018.00075\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00075\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction
Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.