{"title":"Stack More LLM’s: Efficient Detection of Machine-Generated Texts via Perplexity Approximation","authors":"G. M. Gritsai, I. A. Khabutdinov, A. V. Grabovoy","doi":"10.1134/S1064562424602075","DOIUrl":"10.1134/S1064562424602075","url":null,"abstract":"<p>The development of large language models (LLMs) is currently receiving a great amount of interest, but an update of text generation methods should entail a continuous update of methods for detecting machine-generated texts. Earlier, it has been highlighted that values of perplexity and log-probability are able to capture a measure of the difference between artificial and human-written texts. Using this observation, we define a new criterion based on these two values to judge whether a passage is generated from a given LLM. In this paper, we propose a novel efficient method that enables the detection of machine-generated fragments using an approximation of the LLM perplexity value based on pre-collected statistical language models. Approximation lends a hand in achieving high performance and quality metrics also on fragments from weights-closed LLMs. A large number of pre-collected statistical dictionaries results in an increased generalisation ability and the possibility to cover text sequences from the wild. Such approach is easy to update by only adding a new dictionary with latest model text outputs. The presented method has a high performance and achieves quality with an average of 94% recall in detecting generated fragments among texts from various open-source LLMs. In addition, the method is able to perform in milliseconds, which outperforms state-of-the-art models by a factor of thousands.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S203 - S211"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602075.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Empirical Scrutinization of Four Crisp Clustering Methods with Four Distance Metrics and One Straightforward Interpretation Rule","authors":"T. A. Alvandyan, S. Shalileh","doi":"10.1134/S1064562424602002","DOIUrl":"10.1134/S1064562424602002","url":null,"abstract":"<p>Clustering has always been in great demand by scientific and industrial communities. However, due to the lack of ground truth, interpreting its obtained results can be debatable. The current research provides an empirical benchmark on the efficiency of three popular and one recently proposed crisp clustering methods. To this end, we extensively analyzed these (four) methods by applying them to nine real-world and 420 synthetic datasets using four different values of <i>p</i> in Minkowski distance. Furthermore, we validated a previously proposed yet not well-known straightforward rule to interpret the recovered clusters. Our computations showed (i) Nesterov gradient descent clustering is the most effective clustering method using our real-world data, while K-Means had edge over it using our synthetic data; (ii) Minkowski distance with <i>p</i> = 1 is the most effective distance function, (iii) the investigated cluster interpretation rule is intuitive and valid.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 1 supplement","pages":"S236 - S250"},"PeriodicalIF":0.5,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1134/S1064562424602002.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"P-Factor Interpolation of Solutions of an Equation with a Degenerate Function","authors":"Yu. G. Evtushenko, A. A. Tret’yakov","doi":"10.1134/S1064562424601689","DOIUrl":"10.1134/S1064562424601689","url":null,"abstract":"<p>The paper considers a new method for interpolation of nonlinear functions on an interval, the so-called <i>p</i>-factor interpolation method. By using a Newton interpolation polynomial as an example, it is shown that, in the case of degeneration of the approximated function <span>(f(x))</span> in the solution, classical interpolation does not provide the necessary accuracy for finding an approximate solution to the equation <span>(f(x) = 0)</span>, in contrast to the nondegenerate regular case. In turn, the use of <i>p</i>-factor interpolation polynomials for approximating functions in order to obtain the desired approximate solution to the equation provides the necessary order of accuracy in the argument during the calculations. The results are based on constructions of <i>p</i>-regularity theory and the apparatus of <i>p</i>-factor operators, which are effectively used in the study of degenerate mappings.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"451 - 456"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Storchevoy, P. Parshakov, S. Paklina, A. Buzmakov, V. Krakovich
{"title":"AI-Based Ethics Index of Russian Banks","authors":"M. Storchevoy, P. Parshakov, S. Paklina, A. Buzmakov, V. Krakovich","doi":"10.1134/S1064562424702338","DOIUrl":"10.1134/S1064562424702338","url":null,"abstract":"<p>Measuring a company’s ethics is an important element in the mechanism of regulating the behavior of market participants, as it allows consumers and regulators to make better decisions, which has a disciplining effect on companies. We tested various methods for machine analysis of feedback from Russian bank consumers and developed an ethics index that allows us to calculate a quantitative assessment of the ethics of three hundred Russian banks based on consumer feedback for different time periods from 2005 to 2022. We used a bag-of-words method based on the Moral Foundations Dictionary (MFD) and BERT model training based on a 3000- and 10 000-sentence sample marked up by experts. The resulting index was validated based on the number of arbitration cases from 2005 to 2022 (more ethical companies are involved in fewer arbitration cases as a defendant). As a result, only the BERT model was validated, whereas the MFD-based model was not. The ethics index would be useful as a metric alternative to popular ESG ratings for both theoretical research on company behavior and practical tasks of managing company reputation and forming policies of regulating the behavior of market participants.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"511 - 520"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. S. Bubnov, N. I. Gallini, I. Yu. Grishin, I. M. Kobozeva, N. V. Loukachevitch, M. B. Panich, E. N. Raevsky, F. A. Sadkovsky, R. R. Timirgaleeva
{"title":"Ontologies As a Foundation for Formalization of Scientific Information and Extraction of New Knowledge","authors":"A. S. Bubnov, N. I. Gallini, I. Yu. Grishin, I. M. Kobozeva, N. V. Loukachevitch, M. B. Panich, E. N. Raevsky, F. A. Sadkovsky, R. R. Timirgaleeva","doi":"10.1134/S106456242470234X","DOIUrl":"10.1134/S106456242470234X","url":null,"abstract":"<p>“Ark of Knowledge” is a digital project developed by Lomonosov Moscow State University. It provides access to fundamental knowledge in Russian and should play a key role in the preservation and dissemination of Russia’s cultural and scientific heritage. “Ark of Knowledge” is an ontological information system. The article discusses modern ideas about ontology, the stages of creation and ontological features of the Great Russian Encyclopedia and Wikidata, as well as the design of an information system and its use for language models training. The initial working prototype of this information system is briefly described. Work on creating the system is being carried out by researchers and programmers from the Knowledge Engineering Laboratory of the Institute for Mathematical Research of Complex Systems of Lomonosov Moscow State University, as well as researchers from the Faculties of Philology, Mechanics and Mathematics, Computational Mathematics and Cybernetics, and the Branch of Moscow State University in Sevastopol.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"521 - 527"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boundary Value Problems for Ordinary Differential Equations with Linear Dependence on the Spectral Parameter","authors":"V. S. Kobenko, A. A. Shkalikov","doi":"10.1134/S1064562424602427","DOIUrl":"10.1134/S1064562424602427","url":null,"abstract":"<p>The paper considers boundary value problems generated by an ordinary differential expression of the <i>n</i>th order and arbitrary boundary conditions with linear dependence on the spectral parameter both in the equation and the boundary conditions. Classes of problems are defined, which are called regular and strongly regular. Linear operators in the space <span>(H = {{L}_{2}}[0,1] oplus {{mathbb{C}}^{m}},;m leqslant n,)</span> are assigned to these problems, and the corresponding adjoint operators are constructed in explicit form. In the general form, we solve the problem of selecting “superfluous” eigenfunctions, which was previously studied only for the special cases of second- and fourth-order equations. Namely, a criterion is found for selecting <i>m</i> eigen- or associated (root) functions of a regular problem so that the remaining system of root functions forms a Riesz basis or a Riesz basis with parenthesis in the original space <span>({{L}_{2}}[0,1])</span>.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"506 - 510"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tunnel Clustering Method","authors":"F. T. Aleskerov, A. L. Myachin, V. I. Yakuba","doi":"10.1134/S1064562424702314","DOIUrl":"10.1134/S1064562424702314","url":null,"abstract":"<p>We propose a novel method for rapid pattern analysis of high-dimensional numerical data, termed tunnel clustering. The main advantages of the method are its relatively low computational complexity, endogenous determination of cluster composition and number, and a high degree of interpretability of final results. We present descriptions of three different variations: one with fixed hyperparameters, an adaptive version, and a combined approach. Three fundamental properties of tunnel clustering are examined. Practical applications are demonstrated on both synthetic datasets containing 100 000 objects and on classical benchmark datasets.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"474 - 479"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On an Approximation by Band-Limited Functions","authors":"Yu. A. Kriksin, V. F. Tishkin","doi":"10.1134/S1064562424602312","DOIUrl":"10.1134/S1064562424602312","url":null,"abstract":"<p>The problem of approximating a continuous real function of one real variable defined on an interval using a band-limited function based on Tikhonov’s regularization method is considered. Numerical estimates of the accuracy of such approximations are calculated for a model trigonometric function. We analyze why a theoretical estimate for the approximation accuracy of a continuous function by band-limited functions is difficult to achieve numerically. The problem of estimating the spectrum of a signal defined on a finite interval is discussed.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"500 - 505"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lattice Boltzmann Model for Nonlinear Anisotropic Diffusion with Applications to Image Processing","authors":"O. V. Ilyin","doi":"10.1134/S1064562424601288","DOIUrl":"10.1134/S1064562424601288","url":null,"abstract":"<p>It is shown that the multiple nonconstant relaxation time lattice Boltzmann equation for five discrete velocities is equivalent in the diffusion limit to a nonlinear anisotropic diffusion equation. The proposed model is applied to speckle and Gaussian noise removal problem.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"464 - 468"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. B. Petrov, V. I. Golubev, A. V. Shevchenko, A. Sharma
{"title":"Three-Dimensional Grid-Characteristic Schemes of High Order of Approximation","authors":"I. B. Petrov, V. I. Golubev, A. V. Shevchenko, A. Sharma","doi":"10.1134/S1064562424601343","DOIUrl":"10.1134/S1064562424601343","url":null,"abstract":"<p>This paper examines seismic wave propagation in a full three-dimensional case. In practice, the stress-strain state of a geological medium during seismic exploration is frequently described using acoustic and linear elastic models. The governing systems of partial differential equations of both models are linear hyperbolic. A computational algorithm for them can be constructed by applying a grid-characteristic approach. In the case of multidimensional problems, an important role is played by dimensional splitting. However, the final three-dimensional scheme fails to preserve the achieved high order even in the case of extended spatial stencils used to solve the resulting one-dimensional problems. In this paper, we propose an approach based on multistage operator splitting schemes, which made it possible to construct a three-dimensional grid-characteristic scheme of the third order. Several test problems are solved numerically.</p>","PeriodicalId":531,"journal":{"name":"Doklady Mathematics","volume":"110 3","pages":"457 - 463"},"PeriodicalIF":0.5,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143513280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}