从特殊领域语料库中自动提取阿拉伯语术语

2014 International Conference on Asian Language Processing (IALP) Pub Date : 2014-12-04 DOI:10.1109/IALP.2014.6973468

A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi

{"title":"从特殊领域语料库中自动提取阿拉伯语术语","authors":"A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi","doi":"10.1109/IALP.2014.6973468","DOIUrl":null,"url":null,"abstract":"The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.","PeriodicalId":117334,"journal":{"name":"2014 International Conference on Asian Language Processing (IALP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Automatic Arabic term extraction from special domain corpora\",\"authors\":\"A. Al-Thubaity, Marwa Khan, Saad Alotaibi, Badriyya Alonazi\",\"doi\":\"10.1109/IALP.2014.6973468\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.\",\"PeriodicalId\":117334,\"journal\":{\"name\":\"2014 International Conference on Asian Language Processing (IALP)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Asian Language Processing (IALP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2014.6973468\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2014.6973468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在数字图书馆、阿拉伯大学出版物网站和评审期刊中，机器可读的阿拉伯语特殊领域文本的可用性促进了许多有趣的研究和应用。这些应用包括从特殊领域语料库中自动提取术语。这些提取的术语可以作为其他应用和研究的基础，如特殊领域词典的构建、术语资源的创建和特殊领域本体的构建。文献调查显示，对阿拉伯语特殊领域文本的研究缺乏;此外，已经确定的少数研究使用复杂和计算昂贵的方法。在本研究中，我们使用两种基本的方法从阿拉伯语特殊领域语料库中自动提取术语。我们的方法基于两个简单的启发式。特殊领域语料库中出现频率最高的词和n-gram通常是术语，它们本身通常被功能词所限制。我们把我们的方法应用在一个应用阿拉伯语言学的语料库上。我们获得的结果与其他阿拉伯语术语提取研究相当，当只考虑与应用阿拉伯语言学领域严格相关的术语时，他们显示出87%的准确性，当包括相关术语时，准确度为93.7%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Arabic term extraction from special domain corpora

The availability of machine-readable Arabic special domain text in digital libraries, websites of Arabic university publications, and refereed journals fosters numerous interesting studies and applications. Among these applications is automatic term extraction from special domain corpora. These extracted terms can serve as a foundation for other applications and research, such as special domain dictionary building, terminology resource creation, and special domain ontology construction. Our literature survey shows a lack of such studies for Arabic special domain text; moreover, the few studies that have been identified use complex and computationally expensive methods. In this study, we use two basic methods to automatically extract terms from Arabic special domain corpora. Our methods are based on two simple heuristics. The most frequent words and n-grams in special domain corpora are typically terms, which themselves are typically bounded by functional words. We applied our methods on a corpus of applied Arabic linguistics. We obtained results comparable to those of other Arabic term extraction studies in that they exhibited 87% accuracy when only terms strictly pertaining to the field of applied Arabic linguistics were considered, and 93.7% when related terms were included.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 International Conference on Asian Language Processing (IALP)

自引率

0.00%

发文量