从特殊领域语料库中自动提取阿拉伯语术语

A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi
{"title":"从特殊领域语料库中自动提取阿拉伯语术语","authors":"A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi","doi":"10.1109/IALP.2014.6973468","DOIUrl":null,"url":null,"abstract":"The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Automatic Arabic term extraction from special domain corpora\",\"authors\":\"A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi\",\"doi\":\"10.1109/IALP.2014.6973468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.\",\"PeriodicalId\":117334,\"journal\":{\"name\":\"2014 International Conference on Asian Language Processing (IALP)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2014.6973468\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

在数字图书馆、阿拉伯大学出版物网站和评审期刊中,机器可读的阿拉伯语特殊领域文本的可用性促进了许多有趣的研究和应用。这些应用包括从特殊领域语料库中自动提取术语。这些提取的术语可以作为其他应用和研究的基础,如特殊领域词典的构建、术语资源的创建和特殊领域本体的构建。文献调查显示,对阿拉伯语特殊领域文本的研究缺乏;此外,已经确定的少数研究使用复杂和计算昂贵的方法。在本研究中,我们使用两种基本的方法从阿拉伯语特殊领域语料库中自动提取术语。我们的方法基于两个简单的启发式。特殊领域语料库中出现频率最高的词和n-gram通常是术语,它们本身通常被功能词所限制。我们把我们的方法应用在一个应用阿拉伯语言学的语料库上。我们获得的结果与其他阿拉伯语术语提取研究相当,当只考虑与应用阿拉伯语言学领域严格相关的术语时,他们显示出87%的准确性,当包括相关术语时,准确度为93.7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Automatic Arabic term extraction from special domain corpora
The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信