信息检索:一种新的基于统计方法的多语种系统

2015 3rd International Conference on Control, Engineering & Information Technology (CEIT) Pub Date : 2015-05-25 DOI:10.1109/CEIT.2015.7233113

Said Gadri, A. Moussaoui

{"title":"信息检索:一种新的基于统计方法的多语种系统","authors":"Said Gadri, A. Moussaoui","doi":"10.1109/CEIT.2015.7233113","DOIUrl":null,"url":null,"abstract":"Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.","PeriodicalId":281793,"journal":{"name":"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Information retrieval: A new multilingual stemmer based on a statistical approach\",\"authors\":\"Said Gadri, A. Moussaoui\",\"doi\":\"10.1109/CEIT.2015.7233113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.\",\"PeriodicalId\":281793,\"journal\":{\"name\":\"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEIT.2015.7233113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEIT.2015.7233113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

词干提取是一种将屈折词和衍生词还原为基本形式(词干或词根)的技术。预处理是文本挖掘中非常重要的一步，通常用于自然语言处理(NLP)、文本分类(TC)、文本摘要(TS)、信息检索(IR)等文本挖掘的研究领域。词干提取在文本分类中经常被用于减少词汇表的大小，在信息检索中经常被用于提高搜索效率，然后给出相关的结果。在本文中，我们提出了一种基于词根提取的多语言词干，并使用了n-grams技术。我们用阿拉伯语、法语和英语三种语言验证了我们的梗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Information retrieval: A new multilingual stemmer based on a statistical approach

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)

自引率

0.00%

发文量