确定科学和技术文本连贯性和主题统一性的以段落为导向的方法

Q3 Computer Science
Ihor Shevchenko, Pavlo Andreev, Maiia Dernova, Olena Poddubei
{"title":"确定科学和技术文本连贯性和主题统一性的以段落为导向的方法","authors":"Ihor Shevchenko, Pavlo Andreev, Maiia Dernova, Olena Poddubei","doi":"10.32620/reks.2023.2.03","DOIUrl":null,"url":null,"abstract":"The subject of the article is to determine the degree of scientific and technical text connectedness using statistical calculations. The aim of the scientific investigation is to study the possibilities of using the coherence of fluctuations in the relative frequencies of keywords in paragraphs to determine the lexical coherence and thematic unity of scientific and technical texts. The task is to develop a method for determining the thematic unity of a text at the set of paragraphs level; to develop a method for determining the coherence of a text at the set of paragraphs level; and to test the developed methods on a collection of documents. The methods used are statistical analysis and computational experiment methods. The following results were obtained. The study has shown that it is advisable to cluster paragraphs as points in the keyword space to determine the degree of scientific and technical text coherence at the level of paragraphs. This opens up the possibility of calculating the degree of thematic unity within the clusters and in the entire text. The degree of text fragments and the whole text coherence is determined by analyzing the sequence of paragraph numbers in the clusters. This makes it possible to formally determine the quality of the material presented in a scientific and technical article or in a textbook. Conclusions. The scientific novelty of the study is as follows: there was refined on the method for determination of the connectedness degree (coherence and thematic unity) of scientific and technical texts at the level of paragraphs by implementation of paragraphs clustering in the keywords space, using the calculation of thematic unity degree inside the clusters and in the overall text, as well as through analysis of paragraphs numbers sequence in clusters in order to determine the degree of text fragments and the overall text coherence. The methods are language-independent, based on clear hypotheses, and complement each other. The methods have an adjusting element that can be used to adapt it to different thematic and stylistic areas. It has been experimentally proved that the proposed methods for the determination of scientific and technical text connectedness are efficient and can provide the framework for information technology of content analysis of scientific and technical texts. The proposed methods do not use WEB resources for syntactic and semantic analysis, providing the possibility to use them autonomously.","PeriodicalId":36122,"journal":{"name":"Radioelectronic and Computer Systems","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Paragraph-oriented methods for determining the coherence and thematic unity of scientific and technical texts\",\"authors\":\"Ihor Shevchenko, Pavlo Andreev, Maiia Dernova, Olena Poddubei\",\"doi\":\"10.32620/reks.2023.2.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The subject of the article is to determine the degree of scientific and technical text connectedness using statistical calculations. The aim of the scientific investigation is to study the possibilities of using the coherence of fluctuations in the relative frequencies of keywords in paragraphs to determine the lexical coherence and thematic unity of scientific and technical texts. The task is to develop a method for determining the thematic unity of a text at the set of paragraphs level; to develop a method for determining the coherence of a text at the set of paragraphs level; and to test the developed methods on a collection of documents. The methods used are statistical analysis and computational experiment methods. The following results were obtained. The study has shown that it is advisable to cluster paragraphs as points in the keyword space to determine the degree of scientific and technical text coherence at the level of paragraphs. This opens up the possibility of calculating the degree of thematic unity within the clusters and in the entire text. The degree of text fragments and the whole text coherence is determined by analyzing the sequence of paragraph numbers in the clusters. This makes it possible to formally determine the quality of the material presented in a scientific and technical article or in a textbook. Conclusions. The scientific novelty of the study is as follows: there was refined on the method for determination of the connectedness degree (coherence and thematic unity) of scientific and technical texts at the level of paragraphs by implementation of paragraphs clustering in the keywords space, using the calculation of thematic unity degree inside the clusters and in the overall text, as well as through analysis of paragraphs numbers sequence in clusters in order to determine the degree of text fragments and the overall text coherence. The methods are language-independent, based on clear hypotheses, and complement each other. The methods have an adjusting element that can be used to adapt it to different thematic and stylistic areas. It has been experimentally proved that the proposed methods for the determination of scientific and technical text connectedness are efficient and can provide the framework for information technology of content analysis of scientific and technical texts. The proposed methods do not use WEB resources for syntactic and semantic analysis, providing the possibility to use them autonomously.\",\"PeriodicalId\":36122,\"journal\":{\"name\":\"Radioelectronic and Computer Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radioelectronic and Computer Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.32620/reks.2023.2.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radioelectronic and Computer Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.32620/reks.2023.2.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

摘要

这篇文章的主题是使用统计计算来确定科学和技术文本的连通性程度。科学调查的目的是研究利用段落中关键词相对频率波动的连贯性来确定科技文本的词汇连贯性和主题统一性的可能性。任务是制定一种方法,在段落级别确定文本的主题统一性;制定一种在段落层次上确定文本连贯性的方法;并在一组文件上测试所开发的方法。使用的方法有统计分析法和计算实验法。获得以下结果。研究表明,最好将段落归类为关键词空间中的点,以确定段落层面的科学和技术文本连贯程度。这为计算集群内和整个文本中的主题统一程度开辟了可能性。通过分析聚类中段落编号的顺序来确定文本片段的程度和全文的连贯性。这使得正式确定科技文章或教科书中所提供材料的质量成为可能。结论。本研究的科学新颖性如下:通过在关键词空间中实施段落聚类,利用聚类内和整体文本中的主题统一度计算,改进了确定科技文本在段落层面的连通度(连贯性和主题统一性)的方法,以及通过对段落编号的聚类顺序进行分析,以确定文本片段的程度和整体文本的连贯性。这些方法独立于语言,基于明确的假设,并且相互补充。这些方法有一个调整元素,可以用来调整它以适应不同的主题和风格领域。实验证明,所提出的确定科技文本连通性的方法是有效的,可以为科技文本内容分析的信息技术提供框架。所提出的方法不使用WEB资源进行句法和语义分析,提供了自主使用它们的可能性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Paragraph-oriented methods for determining the coherence and thematic unity of scientific and technical texts
The subject of the article is to determine the degree of scientific and technical text connectedness using statistical calculations. The aim of the scientific investigation is to study the possibilities of using the coherence of fluctuations in the relative frequencies of keywords in paragraphs to determine the lexical coherence and thematic unity of scientific and technical texts. The task is to develop a method for determining the thematic unity of a text at the set of paragraphs level; to develop a method for determining the coherence of a text at the set of paragraphs level; and to test the developed methods on a collection of documents. The methods used are statistical analysis and computational experiment methods. The following results were obtained. The study has shown that it is advisable to cluster paragraphs as points in the keyword space to determine the degree of scientific and technical text coherence at the level of paragraphs. This opens up the possibility of calculating the degree of thematic unity within the clusters and in the entire text. The degree of text fragments and the whole text coherence is determined by analyzing the sequence of paragraph numbers in the clusters. This makes it possible to formally determine the quality of the material presented in a scientific and technical article or in a textbook. Conclusions. The scientific novelty of the study is as follows: there was refined on the method for determination of the connectedness degree (coherence and thematic unity) of scientific and technical texts at the level of paragraphs by implementation of paragraphs clustering in the keywords space, using the calculation of thematic unity degree inside the clusters and in the overall text, as well as through analysis of paragraphs numbers sequence in clusters in order to determine the degree of text fragments and the overall text coherence. The methods are language-independent, based on clear hypotheses, and complement each other. The methods have an adjusting element that can be used to adapt it to different thematic and stylistic areas. It has been experimentally proved that the proposed methods for the determination of scientific and technical text connectedness are efficient and can provide the framework for information technology of content analysis of scientific and technical texts. The proposed methods do not use WEB resources for syntactic and semantic analysis, providing the possibility to use them autonomously.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Radioelectronic and Computer Systems
Radioelectronic and Computer Systems Computer Science-Computer Graphics and Computer-Aided Design
CiteScore
3.60
自引率
0.00%
发文量
50
审稿时长
2 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信