阿拉伯语口语文本的句子边界检测:初步结果

2011 International Conference on Asian Language Processing Pub Date : 2011-11-15 DOI:10.1109/IALP.2011.38

A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman

{"title":"阿拉伯语口语文本的句子边界检测:初步结果","authors":"A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman","doi":"10.1109/IALP.2011.38","DOIUrl":null,"url":null,"abstract":"Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result\",\"authors\":\"A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman\",\"doi\":\"10.1109/IALP.2011.38\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.\",\"PeriodicalId\":297167,\"journal\":{\"name\":\"2011 International Conference on Asian Language Processing\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2011.38\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2011.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

最近，自然语言处理任务更频繁地在在线内容上进行。这对阿拉伯语的应用程序提出了一个特殊的问题。在线阿拉伯语内容通常以非正式的阿拉伯语口语书写，其特点是结构不良，缺乏具体的语言标准化。在本文中，我们研究了成功进行自然语言处理的第一步，即句子边界检测问题。由于非正式阿拉伯语缺乏基本的语言规则，我们通过对大量非正式阿拉伯语文本的广泛研究，建立了一份常用标点符号列表。此外，我们评估了这些标点符号作为句子分隔符的正确用法;结果产生了70%的初步准确度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result

Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Asian Language Processing

自引率

0.00%

发文量