Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data

Q4 Engineering

Russian Microelectronics Pub Date : 2024-02-15 DOI:10.1134/s1063739723070120

E. N. Petrov, E. M. Portnov

{"title":"Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data","authors":"E. N. Petrov, E. M. Portnov","doi":"10.1134/s1063739723070120","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">\n<b>Abstract</b>—</h3><p>With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.</p>","PeriodicalId":21534,"journal":{"name":"Russian Microelectronics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Microelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1134/s1063739723070120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

Abstract—

With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.

Abstract Image

查看原文本刊更多论文

以书目数据为例，研究递归神经网络解决结构不良信息分类问题的能力

摘要--随着信息技术的发展，自动数据处理正在进入新的领域，其中包括书目数据。当信息从不同来源收集，并且包含结构不统一、格式错误的书目记录时，将数据传输到汇总表需要花费大量的时间和精力，其结果也会受到人为因素的影响。因此，自动书目数据处理具有现实意义和需求。本文研究了递归神经网络（RNS）在解决结构不良书目信息分类问题方面的能力。研究表明，为了使用 RNS，有必要将所收集书目数据的自然呈现方式改为指示性呈现方式，即以一组特征的形式呈现数据。选择这样一组特征是一个单独的复杂问题。所开发的 RNS 结构使用 Python 编程语言实现。为了评估所开发软件模块的性能，我们从国立电子科技大学（MIET）系统与软件工程师和信息技术研究所过去五年的出版物列表中创建了一个测试集。准确率达到了 86%，比使用前馈神经网络获得的结果高出 11%。所开发的特征集和 RNS 结构允许自动处理书目数据，然后由操作员对结果进行强制性修正。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Russian Microelectronics Materials Science-Materials Chemistry

CiteScore

0.70

自引率

0.00%

发文量

期刊介绍： Russian Microelectronics covers physical, technological, and some VLSI and ULSI circuit-technical aspects of microelectronics and nanoelectronics; it informs the reader of new trends in submicron optical, x-ray, electron, and ion-beam lithography technology; dry processing techniques, etching, doping; and deposition and planarization technology. Significant space is devoted to problems arising in the application of proton, electron, and ion beams, plasma, etc. Consideration is given to new equipment, including cluster tools and control in situ and submicron CMOS, bipolar, and BICMOS technologies. The journal publishes papers addressing problems of molecular beam epitaxy and related processes; heterojunction devices and integrated circuits; the technology and devices of nanoelectronics; and the fabrication of nanometer scale devices, including new device structures, quantum-effect devices, and superconducting devices. The reader will find papers containing news of the diagnostics of surfaces and microelectronic structures, the modeling of technological processes and devices in micro- and nanoelectronics, including nanotransistors, and solid state qubits.