{"title":"基于文本分类和信息检索需求的多语种词干开发","authors":"Said Gadri, E. Neuhold","doi":"10.15676/ijeei.2022.14.2.3","DOIUrl":null,"url":null,"abstract":": Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.","PeriodicalId":38705,"journal":{"name":"International Journal on Electrical Engineering and Informatics","volume":"41 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval\",\"authors\":\"Said Gadri, E. Neuhold\",\"doi\":\"10.15676/ijeei.2022.14.2.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.\",\"PeriodicalId\":38705,\"journal\":{\"name\":\"International Journal on Electrical Engineering and Informatics\",\"volume\":\"41 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal on Electrical Engineering and Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15676/ijeei.2022.14.2.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Electrical Engineering and Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15676/ijeei.2022.14.2.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Engineering","Score":null,"Total":0}
Developing a Multilingual Stemmer for the Requirement of Text Categorization and Information Retrieval
: Information retrieval IR is the process of finding information (generally documents) that matches the needs of the user. One way to improve the search effectiveness, as well as the quality of text categorization is to build an effective stemmer that helps to match users’ queries with relevant documents in IR and reduce the space of textual representation in TC. This has been always an interesting research topic in IR and TC. We can define stemming as the process of reducing inflected and derived words to their reduced forms (stems or roots). Many stemmers have been developed for different languages, but there is always many weaknesses and problems. In the present work, we have developed a multilingual stemming approach, based on the extraction of the word root and that exploits the technique of n-grams of characters. Our experiments have been done on three languages which are: Arabic, English, and French.
期刊介绍:
International Journal on Electrical Engineering and Informatics is a peer reviewed journal in the field of electrical engineering and informatics. The journal is published quarterly by The School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Indonesia. All papers will be blind reviewed. Accepted papers will be available on line (free access) and printed version. No publication fee. The journal publishes original papers in the field of electrical engineering and informatics which covers, but not limited to, the following scope : Power Engineering Electric Power Generation, Transmission and Distribution, Power Electronics, Power Quality, Power Economic, FACTS, Renewable Energy, Electric Traction, Electromagnetic Compatibility, Electrical Engineering Materials, High Voltage Insulation Technologies, High Voltage Apparatuses, Lightning Detection and Protection, Power System Analysis, SCADA, Electrical Measurements Telecommunication Engineering Antenna and Wave Propagation, Modulation and Signal Processing for Telecommunication, Wireless and Mobile Communications, Information Theory and Coding, Communication Electronics and Microwave, Radar Imaging, Distributed Platform, Communication Network and Systems, Telematics Services, Security Network, and Radio Communication. Computer Engineering Computer Architecture, Parallel and Distributed Computer, Pervasive Computing, Computer Network, Embedded System, Human—Computer Interaction, Virtual/Augmented Reality, Computer Security, VLSI Design-Network Traffic Modeling, Performance Modeling, Dependable Computing, High Performance Computing, Computer Security.