{"title":"自动文字处理系统中哈萨克语叙词表的构建","authors":"A. Aitim, R. Satybaldiyeva, W. Wójcik","doi":"10.1145/3410352.3410789","DOIUrl":null,"url":null,"abstract":"In the paper presents an overview of existing electronic Kazakh-language thesauri and their automatic methods of construction and application. The author analyzed the main characteristics of open access thesauri for scientific research, evaluated the dynamics of their development and effectiveness in solving problems of natural language processing. Statistical and linguistic methods of thesaurus construction were studied, which allow to automate the development and reduce the labor costs of expert linguists. It is considered algorithms for selecting key terms from texts and semantic thesaurus links of all types, as well as the quality of application of the resulting thesauri. For illustrate the features of various methods of building thesaurus links, a combined method was developed that generates a specialized thesaurus completely automatically based on the corpus of domain texts and several existing linguistic resources.","PeriodicalId":178037,"journal":{"name":"Proceedings of the 6th International Conference on Engineering & MIS 2020","volume":"265 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"The construction of the Kazakh language thesauri in automatic word processing system\",\"authors\":\"A. Aitim, R. Satybaldiyeva, W. Wójcik\",\"doi\":\"10.1145/3410352.3410789\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the paper presents an overview of existing electronic Kazakh-language thesauri and their automatic methods of construction and application. The author analyzed the main characteristics of open access thesauri for scientific research, evaluated the dynamics of their development and effectiveness in solving problems of natural language processing. Statistical and linguistic methods of thesaurus construction were studied, which allow to automate the development and reduce the labor costs of expert linguists. It is considered algorithms for selecting key terms from texts and semantic thesaurus links of all types, as well as the quality of application of the resulting thesauri. For illustrate the features of various methods of building thesaurus links, a combined method was developed that generates a specialized thesaurus completely automatically based on the corpus of domain texts and several existing linguistic resources.\",\"PeriodicalId\":178037,\"journal\":{\"name\":\"Proceedings of the 6th International Conference on Engineering & MIS 2020\",\"volume\":\"265 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th International Conference on Engineering & MIS 2020\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3410352.3410789\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Engineering & MIS 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3410352.3410789","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The construction of the Kazakh language thesauri in automatic word processing system
In the paper presents an overview of existing electronic Kazakh-language thesauri and their automatic methods of construction and application. The author analyzed the main characteristics of open access thesauri for scientific research, evaluated the dynamics of their development and effectiveness in solving problems of natural language processing. Statistical and linguistic methods of thesaurus construction were studied, which allow to automate the development and reduce the labor costs of expert linguists. It is considered algorithms for selecting key terms from texts and semantic thesaurus links of all types, as well as the quality of application of the resulting thesauri. For illustrate the features of various methods of building thesaurus links, a combined method was developed that generates a specialized thesaurus completely automatically based on the corpus of domain texts and several existing linguistic resources.