{"title":"Enhancing Thai Keyphrase Extraction Using Syntactic Relations: An Adoption of Universal Dependencies Framework","authors":"Chanatip Saetia, Tawunrat Chalothorn, Supawat Taerungruang","doi":"10.1109/iSAI-NLP56921.2022.9960284","DOIUrl":null,"url":null,"abstract":"Topical phrases representing the document and used in various fields are called keyphrases. Various methods are proposed to extract keyphrases automatically. However, most methods rely on candidate selection using linguistic heuristics in the English language. In this work for Thai keyphrases extraction, the candidate selection based on Universal Dependencies (UD) is proposed rather than using only POS sequence to make this step language independent. To enhance candidate selection, tree-based keyphrases extraction is also adapted to keep only logical candidates based on the cohesiveness index (CI). Besides that, the score filtering is proposed to combine linguistic heuristics, like stop words and the phrase's position. In the experiments, our method gained the double averaged F1 score of the state-of-the-art method, even though the UD was trained by only 1,781 EDUs and achieved 84% labeled attachment score. In addition, ablation studies on each factor in score filtering revealed which factor is important for keyphrase extraction.","PeriodicalId":399019,"journal":{"name":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 17th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSAI-NLP56921.2022.9960284","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Topical phrases representing the document and used in various fields are called keyphrases. Various methods are proposed to extract keyphrases automatically. However, most methods rely on candidate selection using linguistic heuristics in the English language. In this work for Thai keyphrases extraction, the candidate selection based on Universal Dependencies (UD) is proposed rather than using only POS sequence to make this step language independent. To enhance candidate selection, tree-based keyphrases extraction is also adapted to keep only logical candidates based on the cohesiveness index (CI). Besides that, the score filtering is proposed to combine linguistic heuristics, like stop words and the phrase's position. In the experiments, our method gained the double averaged F1 score of the state-of-the-art method, even though the UD was trained by only 1,781 EDUs and achieved 84% labeled attachment score. In addition, ablation studies on each factor in score filtering revealed which factor is important for keyphrase extraction.