{"title":"基于马尔可夫链发现科学作品中借用的存在","authors":"Rustam Saakyan, I. Shpekht, Gevorg A. Petrosyan","doi":"10.21638/11701/spbu10.2023.104","DOIUrl":null,"url":null,"abstract":"The study aims to develop optimal approaches to the search for borrowings in scientific works. The article discusses the stages of searching for the presence of borrowings, such as preprocessing, rough filtering of texts, searching for similar texts, and searching for borrowings. The main focus is on the description of approaches and techniques that can be effectively implemented at each stage. For example, for the preprocessing stage, it may be converting text characters from uppercase to lowercase, removing punctuation marks, and removing stop words. For the stage of rough text filtering, it is filters by topic and word frequency. It may be calculating the importance of words in the context of the text and representing the word as a vector in multidimensional space to determine the proximity measure for the stage of finding similar texts. Finally, it is a search for an exact match, paraphrases and a measure of similarity of expressions for the stage of finding borrowings. The scientific novelty lies in using Markov chains to find the similarity of texts for the second and third stages of the search for borrowings proposed by authors. As a result, the example shows the technique of using Markov chains for text representation, searching for the most frequently occurring words, building a graph of a Markov chain of words, and the prospects for using Markov chains of texts for rough filtering and searching for similar texts.","PeriodicalId":43738,"journal":{"name":"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya","volume":"65 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Finding the presence of borrowings in scientific works based on Markov chains\",\"authors\":\"Rustam Saakyan, I. Shpekht, Gevorg A. Petrosyan\",\"doi\":\"10.21638/11701/spbu10.2023.104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The study aims to develop optimal approaches to the search for borrowings in scientific works. The article discusses the stages of searching for the presence of borrowings, such as preprocessing, rough filtering of texts, searching for similar texts, and searching for borrowings. The main focus is on the description of approaches and techniques that can be effectively implemented at each stage. For example, for the preprocessing stage, it may be converting text characters from uppercase to lowercase, removing punctuation marks, and removing stop words. For the stage of rough text filtering, it is filters by topic and word frequency. It may be calculating the importance of words in the context of the text and representing the word as a vector in multidimensional space to determine the proximity measure for the stage of finding similar texts. Finally, it is a search for an exact match, paraphrases and a measure of similarity of expressions for the stage of finding borrowings. The scientific novelty lies in using Markov chains to find the similarity of texts for the second and third stages of the search for borrowings proposed by authors. As a result, the example shows the technique of using Markov chains for text representation, searching for the most frequently occurring words, building a graph of a Markov chain of words, and the prospects for using Markov chains of texts for rough filtering and searching for similar texts.\",\"PeriodicalId\":43738,\"journal\":{\"name\":\"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya\",\"volume\":\"65 1\",\"pages\":\"\"},\"PeriodicalIF\":0.3000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21638/11701/spbu10.2023.104\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vestnik Sankt-Peterburgskogo Universiteta Seriya 10 Prikladnaya Matematika Informatika Protsessy Upravleniya","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21638/11701/spbu10.2023.104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Finding the presence of borrowings in scientific works based on Markov chains
The study aims to develop optimal approaches to the search for borrowings in scientific works. The article discusses the stages of searching for the presence of borrowings, such as preprocessing, rough filtering of texts, searching for similar texts, and searching for borrowings. The main focus is on the description of approaches and techniques that can be effectively implemented at each stage. For example, for the preprocessing stage, it may be converting text characters from uppercase to lowercase, removing punctuation marks, and removing stop words. For the stage of rough text filtering, it is filters by topic and word frequency. It may be calculating the importance of words in the context of the text and representing the word as a vector in multidimensional space to determine the proximity measure for the stage of finding similar texts. Finally, it is a search for an exact match, paraphrases and a measure of similarity of expressions for the stage of finding borrowings. The scientific novelty lies in using Markov chains to find the similarity of texts for the second and third stages of the search for borrowings proposed by authors. As a result, the example shows the technique of using Markov chains for text representation, searching for the most frequently occurring words, building a graph of a Markov chain of words, and the prospects for using Markov chains of texts for rough filtering and searching for similar texts.
期刊介绍:
The journal is the prime outlet for the findings of scientists from the Faculty of applied mathematics and control processes of St. Petersburg State University. It publishes original contributions in all areas of applied mathematics, computer science and control. Vestnik St. Petersburg University: Applied Mathematics. Computer Science. Control Processes features articles that cover the major areas of applied mathematics, computer science and control.