{"title":"基于多特征评价和掩码机制的关键词提取方法","authors":"Liwen Ma, Weifeng Liu","doi":"10.1109/ICCAIS56082.2022.9990092","DOIUrl":null,"url":null,"abstract":"Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.","PeriodicalId":273404,"journal":{"name":"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Keyphrase Extraction Method Based on Multi-feature Evaluation and Mask Mechanism\",\"authors\":\"Liwen Ma, Weifeng Liu\",\"doi\":\"10.1109/ICCAIS56082.2022.9990092\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.\",\"PeriodicalId\":273404,\"journal\":{\"name\":\"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAIS56082.2022.9990092\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference on Control, Automation and Information Sciences (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS56082.2022.9990092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Keyphrase Extraction Method Based on Multi-feature Evaluation and Mask Mechanism
Keyphrase extraction aims to identify phrases in documents that contain core content. However, existing unsupervised keyphrase extraction models are limited to focusing on a single feature leading to biased results. In response to the above problems, it evaluates keyphrase scores through multiple features of semantic importance, topic diversity, and position features. Firstly, it masked the candidate keyphrase from a document and the Manhattan distance between the mask document and the original document is calculated as the semantic importance feature. Secondly, it calculated the topic-word distribution of candidate keyphrases as topic diversity, and the position features are calculated. Finally, the phrase importance score is calculated by integrating the three sub-models. Experiments are conducted on three academic datasets and compared with six state-of-the-art baseline models, outperforming existing methods. The results show that evaluating phrase importance from multiple features significantly improves the performance of extracting keyphrases.