Information retrieval: A new multilingual stemmer based on a statistical approach

Said Gadri, A. Moussaoui
{"title":"Information retrieval: A new multilingual stemmer based on a statistical approach","authors":"Said Gadri, A. Moussaoui","doi":"10.1109/CEIT.2015.7233113","DOIUrl":null,"url":null,"abstract":"Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.","PeriodicalId":281793,"journal":{"name":"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 3rd International Conference on Control, Engineering & Information Technology (CEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEIT.2015.7233113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Stemming is a technique used to reduce inflected and derived words to their basic forms (stem or root). It is a very important step of pre-processing in text mining, and generally used in many areas of research such as: Natural language Processing NLP, Text Categorization TC, Text Summarizing TS, Information Retrieval IR, and other tasks in text mining. Stemming is frequently useful in text categorization to reduce the size of terms vocabulary, and in information retrieval to improve the search effectiveness and then gives us relevant results. In this paper, we propose a new multilingual stemmer based on the extraction of word root and in which we use the technique of n-grams. We validated our stemmer on three languages which are: Arabic, French and English.
信息检索:一种新的基于统计方法的多语种系统
词干提取是一种将屈折词和衍生词还原为基本形式(词干或词根)的技术。预处理是文本挖掘中非常重要的一步,通常用于自然语言处理(NLP)、文本分类(TC)、文本摘要(TS)、信息检索(IR)等文本挖掘的研究领域。词干提取在文本分类中经常被用于减少词汇表的大小,在信息检索中经常被用于提高搜索效率,然后给出相关的结果。在本文中,我们提出了一种基于词根提取的多语言词干,并使用了n-grams技术。我们用阿拉伯语、法语和英语三种语言验证了我们的梗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信