阿拉伯语网络搜索的增强工具

2011 International Conference on Innovations in Information Technology Pub Date : 2011-04-25 DOI:10.1109/INNOVATIONS.2011.5893871

A. Yahya, A. Salhi

{"title":"阿拉伯语网络搜索的增强工具","authors":"A. Yahya, A. Salhi","doi":"10.1109/INNOVATIONS.2011.5893871","DOIUrl":null,"url":null,"abstract":"The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval: root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.","PeriodicalId":173102,"journal":{"name":"2011 International Conference on Innovations in Information Technology","volume":"245 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Enhancement tools for Arabic web search\",\"authors\":\"A. Yahya, A. Salhi\",\"doi\":\"10.1109/INNOVATIONS.2011.5893871\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval: root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.\",\"PeriodicalId\":173102,\"journal\":{\"name\":\"2011 International Conference on Innovations in Information Technology\",\"volume\":\"245 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 International Conference on Innovations in Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INNOVATIONS.2011.5893871\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Innovations in Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INNOVATIONS.2011.5893871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

阿拉伯文的网络内容正在迅速增长，对其有效管理的需求越来越重要，阿拉伯文的形态复杂性在这方面提出了许多挑战。本文报告了我们在设计文本挖掘和查询预处理工具方面的一些工作，这些工具能够有效地处理和搜索大量阿拉伯语网络数据。在我们的研究中，我们试图解决阿拉伯语对自然语言处理(NLP)和信息检索提出的挑战:词根提取、语言检测、阿拉伯语查询更正、建议和扩展。虽然这里没有详细报道，但我们也在开发用于阿拉伯语文档自动分类的工具。在整个过程中，我们采用了基于统计/语料库的方法，该方法基于从各种来源获得的数据。基于语料库统计，我们构建了单词及其单、双、三重表达式频率的数据库，并将其用作结构良好的搜索辅助工具的基础设施，这些工具能够处理阿拉伯语的复杂特性，并能够集成到现有的网络搜索引擎和文档处理系统中。我们还利用上下文分析和用户查询的拼写检查来实现更完整和有效的搜索。虽然这里报告的结果很有希望，但它们必须被视为正在进行的工作，仍然需要在现实生活环境中进行测试、改进、集成和部署。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancement tools for Arabic web search

The Arabic web content is growing rapidly and the need for its efficient management is gaining importance and the morphological complexity of Arabic raises many challenges in this regard. This paper reports on some of our work aimed at designing text mining and query pre-processing tools that are able to efficiently process and search large quantities of Arabic web data. In our research we try to address the challenges Arabic poses for natural language processing (NLP) and information retrieval: root extraction, language detection, and Arabic query correction, suggestion and expansion. While not reported in detail here, we are also developing tools for automatic Arabic document categorization. All through, we employ a statistical/Corpus-based approach based on data obtained from a variety of sources. Based on corpus statistics we constructed databases of words and their frequencies as single, double and triple expressions and used that as the infrastructure for the well structured search aid tools that are able to handle the sophisticated nature of Arabic, and capable of being integrated into existing web search engines and document processing systems. We also utilize context analysis and spellchecking of the user queries to enable a more complete and efficient search. While the results reported here are promising, they must be viewed as work in progress, still in need of testing, refining, integration and deployment in real life settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 International Conference on Innovations in Information Technology

自引率

0.00%

发文量