创建一个基于乌兹别克语知识库的标记化算法*

2022 International Conference on Information Science and Communications Technologies (ICISCT) Pub Date : 2022-09-28 DOI:10.1109/ICISCT55600.2022.10146893

I. Bakaev

{"title":"创建一个基于乌兹别克语知识库的标记化算法*","authors":"I. Bakaev","doi":"10.1109/ICISCT55600.2022.10146893","DOIUrl":null,"url":null,"abstract":"Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.","PeriodicalId":332984,"journal":{"name":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Creating a tokenization algorithm based on the knowledge base for the Uzbek language*\",\"authors\":\"I. Bakaev\",\"doi\":\"10.1109/ICISCT55600.2022.10146893\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.\",\"PeriodicalId\":332984,\"journal\":{\"name\":\"2022 International Conference on Information Science and Communications Technologies (ICISCT)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Information Science and Communications Technologies (ICISCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISCT55600.2022.10146893\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Information Science and Communications Technologies (ICISCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISCT55600.2022.10146893","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当前，从传入信息中正确选择令牌是机器翻译、信息检索、文本信息提取和信息安全等领域的重要问题之一。从文本中提取标记的算法称为标记化。在本研究中，开发了一种基于知识库的标记化算法来从文本中提取词汇。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Creating a tokenization algorithm based on the knowledge base for the Uzbek language*

Currently, the correct selection of tokens from incoming information is one of the important issues in such areas as machine translation, information retrieval, information extraction from text, and information security. Algorithms for extracting tokens from texts are called tokenization. In this study, a tokenization algorithm has been developed that works on the basis of a knowledge base to extract lexemes from a text.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Information Science and Communications Technologies (ICISCT)

自引率

0.00%

发文量