{"title":"专业语料库中的关键词分析——宗教术语提取","authors":"Hsin-Yi Lien","doi":"10.1080/09296174.2020.1865668","DOIUrl":null,"url":null,"abstract":"ABSTRACT This study investigates keyword extraction using a compiled Buddhist corpus. It sets out the fundamental mode of generation and refinement of keywords with statistical measures and manual screening with specific criteria. The Buddhist Word List contains 1244 keywords with 375 Pali words in Buddhist literacy. We compared the results of applying occurring frequency, log-likelihood (LL), and odds ratio (OR) in keyword analyses, each of which resulted in different keyword rankings. Our results show that statistical measures are useful for the identification of particular keywords in specific fields and OR is more effective in identifying technical terms. We demonstrate that multilevel keyword analysis is more effective at the identification of high-frequency technical words than either of these methods used alone. Multilevel methods are recommended for the creation of future domain-specific vocabulary lists to overcome the inherent flaws of individual analytic methods.","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":"29 1","pages":"269 - 282"},"PeriodicalIF":0.7000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/09296174.2020.1865668","citationCount":"3","resultStr":"{\"title\":\"Revisiting Keyword Analysis in a Specialized Corpus: Religious Terminology Extraction\",\"authors\":\"Hsin-Yi Lien\",\"doi\":\"10.1080/09296174.2020.1865668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT This study investigates keyword extraction using a compiled Buddhist corpus. It sets out the fundamental mode of generation and refinement of keywords with statistical measures and manual screening with specific criteria. The Buddhist Word List contains 1244 keywords with 375 Pali words in Buddhist literacy. We compared the results of applying occurring frequency, log-likelihood (LL), and odds ratio (OR) in keyword analyses, each of which resulted in different keyword rankings. Our results show that statistical measures are useful for the identification of particular keywords in specific fields and OR is more effective in identifying technical terms. We demonstrate that multilevel keyword analysis is more effective at the identification of high-frequency technical words than either of these methods used alone. Multilevel methods are recommended for the creation of future domain-specific vocabulary lists to overcome the inherent flaws of individual analytic methods.\",\"PeriodicalId\":45514,\"journal\":{\"name\":\"Journal of Quantitative Linguistics\",\"volume\":\"29 1\",\"pages\":\"269 - 282\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2021-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/09296174.2020.1865668\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Quantitative Linguistics\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://doi.org/10.1080/09296174.2020.1865668\",\"RegionNum\":2,\"RegionCategory\":\"文学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1080/09296174.2020.1865668","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
Revisiting Keyword Analysis in a Specialized Corpus: Religious Terminology Extraction
ABSTRACT This study investigates keyword extraction using a compiled Buddhist corpus. It sets out the fundamental mode of generation and refinement of keywords with statistical measures and manual screening with specific criteria. The Buddhist Word List contains 1244 keywords with 375 Pali words in Buddhist literacy. We compared the results of applying occurring frequency, log-likelihood (LL), and odds ratio (OR) in keyword analyses, each of which resulted in different keyword rankings. Our results show that statistical measures are useful for the identification of particular keywords in specific fields and OR is more effective in identifying technical terms. We demonstrate that multilevel keyword analysis is more effective at the identification of high-frequency technical words than either of these methods used alone. Multilevel methods are recommended for the creation of future domain-specific vocabulary lists to overcome the inherent flaws of individual analytic methods.
期刊介绍:
The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.