基于文本分类和信息检索需求的多语种词干开发

Q2 Engineering
Said Gadri, E. Neuhold
{"title":"基于文本分类和信息检索需求的多语种词干开发","authors":"Said Gadri, E. Neuhold","doi":"10.15676/ijeei.2022.14.2.3","DOIUrl":null,"url":null,"abstract":": Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.","PeriodicalId":38705,"journal":{"name":"International Journal on Electrical Engineering and Informatics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval\",\"authors\":\"Said Gadri, E. Neuhold\",\"doi\":\"10.15676/ijeei.2022.14.2.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.\",\"PeriodicalId\":38705,\"journal\":{\"name\":\"International Journal on Electrical Engineering and Informatics\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Electrical Engineering and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15676/ijeei.2022.14.2.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Electrical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15676/ijeei.2022.14.2.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

摘要

信息检索是查找符合用户需要的信息(通常是文档)的过程。提高检索效率和文本分类质量的一种方法是建立一个有效的系统,帮助用户的查询与IR中的相关文档进行匹配,并减少TC中的文本表示空间。这一直是IR和TC领域一个有趣的研究课题。我们可以将词干化定义为将屈折词和衍生词还原为缩略形式(词干或词根)的过程。针对不同的语言开发了许多stem,但总是存在许多弱点和问题。在目前的工作中,我们开发了一种基于词根提取的多语言词干提取方法,并利用了字符的n-grams技术。我们的实验是在三种语言上完成的:阿拉伯语、英语和法语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval
: Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.70
自引率
0.00%
发文量
31
审稿时长
20 weeks
期刊介绍: International Journal on Electrical Engineering and Informatics is a peer reviewed journal in the field of electrical engineering and informatics. The journal is published quarterly by The School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia. All papers will be blind reviewed. Accepted papers will be available on line (free access) and printed version. No publication fee. The journal publishes original papers in the field of electrical engineering and informatics which covers, but not limited to, the following scope : Power Engineering Electric Power Generation, Transmission and Distribution, Power Electronics, Power Quality, Power Economic, FACTS, Renewable Energy, Electric Traction, Electromagnetic Compatibility, Electrical Engineering Materials, High Voltage Insulation Technologies, High Voltage Apparatuses, Lightning Detection and Protection, Power System Analysis, SCADA, Electrical Measurements Telecommunication Engineering Antenna and Wave Propagation, Modulation and Signal Processing for Telecommunication, Wireless and Mobile Communications, Information Theory and Coding, Communication Electronics and Microwave, Radar Imaging, Distributed Platform, Communication Network and Systems, Telematics Services, Security Network, and Radio Communication. Computer Engineering Computer Architecture, Parallel and Distributed Computer, Pervasive Computing, Computer Network, Embedded System, Human—Computer Interaction, Virtual/Augmented Reality, Computer Security, VLSI Design-Network Traffic Modeling, Performance Modeling, Dependable Computing, High Performance Computing, Computer Security.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信