{"title":"创建一个基于乌兹别克语知识库的标记化算法*","authors":"I. Bakaev","doi":"10.1109/ICISCT55600.2022.10146893","DOIUrl":null,"url":null,"abstract":"Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.","PeriodicalId":332984,"journal":{"name":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Creating a tokenization algorithm based on the knowledge base for the Uzbek language*\",\"authors\":\"I. Bakaev\",\"doi\":\"10.1109/ICISCT55600.2022.10146893\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.\",\"PeriodicalId\":332984,\"journal\":{\"name\":\"2022 International Conference on Information Science and Communications Technologies (ICISCT)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Information Science and Communications Technologies (ICISCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISCT55600.2022.10146893\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCT55600.2022.10146893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Creating a tokenization algorithm based on the knowledge base for the Uzbek language*
Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.