Application of machine learning methods for filling and updating nuclear knowledge bases

Nuclear Energy and Technology Pub Date : 2023-06-20 DOI:10.3897/nucet.9.106759

V. Telnov, Y. Korovin

{"title":"Application of machine learning methods for filling and updating nuclear knowledge bases","authors":"V. Telnov, Y. Korovin","doi":"10.3897/nucet.9.106759","DOIUrl":null,"url":null,"abstract":"The paper deals with issues of designing and creating knowledge bases in the field of nuclear science and technology. The authors present the results of searching for and testing optimal classification and semantic annotation algorithms applied to the textual network content for the convenience of computer-aided filling and updating of scalable semantic repositories (knowledge bases) in the field of nuclear physics and nuclear power engineering and, in the future, for other subject areas, both in Russian and English. The proposed algorithms will provide a methodological and technological basis for creating problem-oriented knowledge bases as artificial intelligence systems, as well as prerequisites for the development of semantic technologies for acquiring new knowledge on the Internet without direct human participation. Testing of the studied machine learning algorithms is carried out by the cross-validation method using corpora of specialized texts. The novelty of the presented study lies in the application of the Pareto optimality principle for multi-criteria evaluation and ranking of the studied algorithms in the absence of a priori information about the comparative significance of the criteria. The project is implemented in accordance with the Semantic Web standards (RDF, OWL, SPARQL, etc.). There are no technological restrictions for integrating the created knowledge bases with third-party data repositories as well as metasearch, library, reference or information and question-answer systems. The proposed software solutions are based on cloud computing using DBaaS and PaaS service models to ensure the scalability of data warehouses and network services. The created software is in the public domain and can be freely replicated.","PeriodicalId":100969,"journal":{"name":"Nuclear Energy and Technology","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nuclear Energy and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3897/nucet.9.106759","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The paper deals with issues of designing and creating knowledge bases in the field of nuclear science and technology. The authors present the results of searching for and testing optimal classification and semantic annotation algorithms applied to the textual network content for the convenience of computer-aided filling and updating of scalable semantic repositories (knowledge bases) in the field of nuclear physics and nuclear power engineering and, in the future, for other subject areas, both in Russian and English. The proposed algorithms will provide a methodological and technological basis for creating problem-oriented knowledge bases as artificial intelligence systems, as well as prerequisites for the development of semantic technologies for acquiring new knowledge on the Internet without direct human participation. Testing of the studied machine learning algorithms is carried out by the cross-validation method using corpora of specialized texts. The novelty of the presented study lies in the application of the Pareto optimality principle for multi-criteria evaluation and ranking of the studied algorithms in the absence of a priori information about the comparative significance of the criteria. The project is implemented in accordance with the Semantic Web standards (RDF, OWL, SPARQL, etc.). There are no technological restrictions for integrating the created knowledge bases with third-party data repositories as well as metasearch, library, reference or information and question-answer systems. The proposed software solutions are based on cloud computing using DBaaS and PaaS service models to ensure the scalability of data warehouses and network services. The created software is in the public domain and can be freely replicated.

查看原文本刊更多论文

用于填充和更新核知识库的机器学习方法Application

本文讨论了核科学与技术领域知识库的设计与创建问题。作者介绍了搜索和测试用于文本网络内容的最佳分类和语义注释算法的结果，以方便计算机辅助填充和更新核物理和核动力工程领域的可扩展语义库(知识库)，并在未来用于其他学科领域，包括俄语和英语。所提出的算法将为创建面向问题的知识库作为人工智能系统提供方法和技术基础，并为在没有人类直接参与的情况下在互联网上获取新知识的语义技术的发展提供先决条件。所研究的机器学习算法通过使用专业文本语料库的交叉验证方法进行测试。本研究的新颖之处在于，在缺乏关于标准比较重要性的先验信息的情况下，应用帕累托最优原则对所研究的算法进行多准则评价和排序。该项目是按照语义Web标准(RDF、OWL、SPARQL等)实现的。将创建的知识库与第三方数据存储库以及元搜索、图书馆、参考或信息和问答系统集成在一起没有技术限制。提出的软件解决方案基于云计算，采用DBaaS和PaaS服务模型，确保数据仓库和网络服务的可扩展性。所创建的软件属于公共领域，可以自由复制。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊