{"title":"Critical Phase Transition in a Large Language Model","authors":"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima","doi":"arxiv-2406.05335","DOIUrl":"https://doi.org/arxiv-2406.05335","url":null,"abstract":"The performance of large language models (LLMs) strongly depends on the\u0000textit{temperature} parameter. Empirically, at very low temperatures, LLMs\u0000generate sentences with clear repetitive structures, while at very high\u0000temperatures, generated sentences are often incomprehensible. In this study,\u0000using GPT-2, we numerically demonstrate that the difference between the two\u0000regimes is not just a smooth change but a phase transition with singular,\u0000divergent statistical quantities. Our extensive analysis shows that critical\u0000behaviors, such as a power-law decay of correlation in a text, emerge in the\u0000LLM at the transition temperature as well as in a natural language dataset. We\u0000also discuss that several statistical quantities characterizing the criticality\u0000should be useful to evaluate the performance of LLMs.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine","authors":"Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto","doi":"arxiv-2406.05377","DOIUrl":"https://doi.org/arxiv-2406.05377","url":null,"abstract":"In recent years, quantum Ising machines have drawn a lot of attention, but\u0000due to physical implementation constraints, it has been difficult to achieve\u0000dense coupling, such as full coupling with sufficient spins to handle practical\u0000large-scale applications. Consequently, classically computable equations have\u0000been derived from quantum master equations for these quantum Ising machines.\u0000Parallel implementations of these algorithms using FPGAs have been used to\u0000rapidly find solutions to these problems on a scale that is difficult to\u0000achieve in physical systems. We have developed an FPGA implemented cyber\u0000coherent Ising machine (cyber CIM) that is much more versatile than previous\u0000implementations using FPGAs. Our architecture is versatile since it can be\u0000applied to the open-loop CIM, which was proposed when CIM research began, to\u0000the closed-loop CIM, which has been used recently, as well as to Jacobi\u0000successive over-relaxation method. By modifying the sequence control code for\u0000the calculation control module, other algorithms such as Simulated Bifurcation\u0000(SB) can also be implemented. Earlier research on large-scale FPGA\u0000implementations of SB and CIM used binary or ternary discrete values for\u0000connections, whereas the cyber CIM used FP32 values. Also, the cyber CIM\u0000utilized Zeeman terms that were represented as FP32, which were not present in\u0000other large-scale FPGA systems. Our implementation with continuous interaction\u0000realizes N=4096 on a single FPGA, comparable to the single-FPGA implementation\u0000of SB with binary interactions, with N=4096. The cyber CIM enables applications\u0000such as CDMA multi-user detector and L0 compressed sensing which were not\u0000possible with earlier FPGA systems, while enabling superior calculation speeds,\u0000more than ten times faster than a GPU implementation. The calculation speed can\u0000be further improved by increasing parallelism, such as through clustering.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"354 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconsideration of optimization for reduction of traffic congestion","authors":"Masayuki Ohzeki","doi":"arxiv-2406.05448","DOIUrl":"https://doi.org/arxiv-2406.05448","url":null,"abstract":"One of the most impressive applications of a quantum annealer was optimizing\u0000a group of Volkswagen to reduce traffic congestion using a D-Wave system. A\u0000simple formulation of a quadratic term was proposed to reduce traffic\u0000congestion. This quadratic term was useful for determining the shortest routes\u0000among several candidates. The original formulation produced decreases in the\u0000total lengths of car tours and traffic congestion. In this study, we\u0000reformulated the cost function with the sole focus on reducing traffic\u0000congestion. We then found a unique cost function for expressing the quadratic\u0000function with a dead zone and an inequality constraint.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems","authors":"C. Wang, Wenxue He, X. R. Wang, Hechen Ren","doi":"arxiv-2406.01984","DOIUrl":"https://doi.org/arxiv-2406.01984","url":null,"abstract":"By using dimensionless conductances as scaling variables, the conventional\u0000one-parameter scaling theory of localization fails for non-reciprocal\u0000non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a\u0000one-parameter scaling function using the participation ratio as the scaling\u0000variable. Employing a highly accurate numerical procedure based on exact\u0000diagonalization, we demonstrate that this one-parameter scaling function can\u0000describe Anderson localization transitions of non-reciprocal non-Hermitian\u0000systems in one and two dimensions of symmetry classes AI and A. The critical\u0000exponents of correlation lengths depend on symmetries and dimensionality only,\u0000a typical feature of universality. Moreover, we derive a complex-gap equation\u0000based on the self-consistent Born approximation that can determine the disorder\u0000at which the point gap closes. The obtained disorders match perfectly the\u0000critical disorders of Anderson localization transitions from the one-parameter\u0000scaling function. Finally, we show that the one-parameter scaling function is\u0000also valid for Anderson localization transitions in reciprocal non-Hermitian\u0000systems such as two-dimensional class AII$^dagger$ and can, thus, serve as a\u0000unified scaling function for disordered non-Hermitian systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prototype Analysis in Hopfield Networks with Hebbian Learning","authors":"Hayden McAlister, Anthony Robins, Lech Szymanski","doi":"arxiv-2407.03342","DOIUrl":"https://doi.org/arxiv-2407.03342","url":null,"abstract":"We discuss prototype formation in the Hopfield network. Typically, Hebbian\u0000learning with highly correlated states leads to degraded memory performance. We\u0000show this type of learning can lead to prototype formation, where unlearned\u0000states emerge as representatives of large correlated subsets of states,\u0000alleviating capacity woes. This process has similarities to prototype learning\u0000in human cognition. We provide a substantial literature review of prototype\u0000learning in associative memories, covering contributions from psychology,\u0000statistical physics, and computer science. We analyze prototype formation from\u0000a theoretical perspective and derive a stability condition for these states\u0000based on the number of examples of the prototype presented for learning, the\u0000noise in those examples, and the number of non-example states presented. The\u0000stability condition is used to construct a probability of stability for a\u0000prototype state as the factors of stability change. We also note similarities\u0000to traditional network analysis, allowing us to find a prototype capacity. We\u0000corroborate these expectations of prototype formation with experiments using a\u0000simple Hopfield network with standard Hebbian learning. We extend our\u0000experiments to a Hopfield network trained on data with multiple prototypes and\u0000find the network is capable of stabilizing multiple prototypes concurrently. We\u0000measure the basins of attraction of the multiple prototype states, finding\u0000attractor strength grows with the number of examples and the agreement of\u0000examples. We link the stability and dominance of prototype states to the energy\u0000profile of these states, particularly when comparing the profile shape to\u0000target states or other spurious states.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"364 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a theory of how the structure of language is acquired by deep neural networks","authors":"Francesco Cagnetta, Matthieu Wyart","doi":"arxiv-2406.00048","DOIUrl":"https://doi.org/arxiv-2406.00048","url":null,"abstract":"How much data is required to learn the structure of a language via next-token\u0000prediction? We study this question for synthetic datasets generated via a\u0000Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model\u0000that captures the tree-like structure of natural languages. We determine\u0000token-token correlations analytically in our model and show that they can be\u0000used to build a representation of the grammar's hidden variables, the longer\u0000the range the deeper the variable. In addition, a finite training set limits\u0000the resolution of correlations to an effective range, whose size grows with\u0000that of the training set. As a result, a Language Model trained with\u0000increasingly many examples can build a deeper representation of the grammar's\u0000structure, thus reaching good performance despite the high dimensionality of\u0000the problem. We conjecture that the relationship between training set size and\u0000effective range of correlations holds beyond our synthetic datasets. In\u0000particular, our conjecture predicts how the scaling law for the test loss\u0000behaviour with training set size depends on the length of the context window,\u0000which we confirm empirically for a collection of lines from Shakespeare's\u0000plays.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fundamental limits of weak learnability in high-dimensional multi-index models","authors":"Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala","doi":"arxiv-2405.15480","DOIUrl":"https://doi.org/arxiv-2405.15480","url":null,"abstract":"Multi-index models -- functions which only depend on the covariates through a\u0000non-linear transformation of their projection on a subspace -- are a useful\u0000benchmark for investigating feature learning with neural networks. This paper\u0000examines the theoretical boundaries of learnability in this hypothesis class,\u0000focusing particularly on the minimum sample complexity required for weakly\u0000recovering their low-dimensional structure with first-order iterative\u0000algorithms, in the high-dimensional regime where the number of samples is\u0000$n=alpha d$ is proportional to the covariate dimension $d$. Our findings\u0000unfold in three parts: (i) first, we identify under which conditions a\u0000textit{trivial subspace} can be learned with a single step of a first-order\u0000algorithm for any $alpha!>!0$; (ii) second, in the case where the trivial\u0000subspace is empty, we provide necessary and sufficient conditions for the\u0000existence of an {it easy subspace} consisting of directions that can be\u0000learned only above a certain sample complexity $alpha!>!alpha_c$. The\u0000critical threshold $alpha_{c}$ marks the presence of a computational phase\u0000transition, in the sense that no efficient iterative algorithm can succeed for\u0000$alpha!<!alpha_c$. In a limited but interesting set of really hard\u0000directions -- akin to the parity problem -- $alpha_c$ is found to diverge.\u0000Finally, (iii) we demonstrate that interactions between different directions\u0000can result in an intricate hierarchical learning phenomenon, where some\u0000directions can be learned sequentially when coupled to easier ones. Our\u0000analytical approach is built on the optimality of approximate message-passing\u0000algorithms among first-order iterative methods, delineating the fundamental\u0000learnability limit across a broad spectrum of algorithms, including neural\u0000networks trained with gradient descent.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid scaling theory of localization transition in a non-Hermitian disorder Aubry-André model","authors":"Yue-Mei Sun, Xin-Yu Wang, Zi-Kang Wang, Liang-Jun Zhai","doi":"arxiv-2405.15220","DOIUrl":"https://doi.org/arxiv-2405.15220","url":null,"abstract":"In this paper, we study the critical behaviors in the non-Hermtian disorder\u0000Aubry-Andr'{e} (DAA) model, and we assume the non-Hermiticity is introduced by\u0000the nonreciprocal hopping. We employ the localization length $xi$, the inverse\u0000participation ratio ($rm IPR$), and the real part of the energy gap between\u0000the first excited state and the ground state $Delta E$ as the character\u0000quantities to describe the critical properties of the localization transition.\u0000By preforming the scaling analysis, the critical exponents of the non-Hermitian\u0000Anderson model and the non-Hermitian DAA model are obtained, and these critical\u0000exponents are different from their Hermitian counterparts, indicating the\u0000Hermitian and non-Hermitian disorder and DAA models belong to different\u0000universe classes. The critical exponents of non-Hermitian DAA model are\u0000remarkably different from both the pure non-Hermitian AA model and the\u0000non-Hermitian Anderson model, showing that disorder is a independent relevant\u0000direction at the non-Hermitian AA model. We further propose a hybrid scaling\u0000theory to describe the critical behavior in the overlapping critical region\u0000constituted by the critical regions of non-Hermitian DAA model and the\u0000non-Hermitian Anderson localization transition.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum criticality of generalized Aubry-André models with exact mobility edges using fidelity susceptibility","authors":"Yu-Bin Liu, Wen-Yi Zhang, Tian-Cheng Yi, Liangsheng Li, Maoxin Liu, Wen-Long You","doi":"arxiv-2405.13282","DOIUrl":"https://doi.org/arxiv-2405.13282","url":null,"abstract":"In this study, we explore the quantum critical phenomena in generalized\u0000Aubry-Andr'{e} models, with a particular focus on the scaling behavior at\u0000various filling states. Our approach involves using quantum fidelity\u0000susceptibility to precisely identify the mobility edges in these systems.\u0000Through a finite-size scaling analysis of the fidelity susceptibility, we are\u0000able to determine both the correlation-length critical exponent and the\u0000dynamical critical exponent at the critical point of the generalized\u0000Aubry-Andr'{e} model. Based on the Diophantine equation conjecture, we can\u0000determines the number of subsequences of the Fibonacci sequence and the\u0000corresponding scaling functions for a specific filling fraction, as well as the\u0000universality class. Our findings demonstrate the effectiveness of employing the\u0000generalized fidelity susceptibility for the analysis of unconventional quantum\u0000criticality and the associated universal information of quasiperiodic systems\u0000in cutting-edge quantum simulation experiments.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"41 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hernández-Sánchez, G. Tapia-Labra, J. A. Mendez-Bermudez
{"title":"Non-Hermitian diluted banded random matrices: Scaling of eigenfunction and spectral properties","authors":"M. Hernández-Sánchez, G. Tapia-Labra, J. A. Mendez-Bermudez","doi":"arxiv-2406.15426","DOIUrl":"https://doi.org/arxiv-2406.15426","url":null,"abstract":"Here we introduce the non-Hermitian diluted banded random matrix (nHdBRM)\u0000ensemble as the set of $Ntimes N$ real non-symmetric matrices whose entries\u0000are independent Gaussian random variables with zero mean and variance one if\u0000$|i-j|<b$ and zero otherwise, moreover off-diagonal matrix elements within the\u0000bandwidth $b$ are randomly set to zero such that the sparsity $alpha$ is\u0000defined as the fraction of the $N(b-1)/2$ independent non-vanishing\u0000off-diagonal matrix elements. By means of a detailed numerical study we\u0000demonstrate that the eigenfunction and spectral properties of the nHdBRM\u0000ensemble scale with the parameter $x=gamma[(balpha)^2/N]^delta$, where\u0000$gamma,deltasim 1$. Moreover, the normalized localization length $beta$ of\u0000the eigenfunctions follows a simple scaling law: $beta = x/(1 + x)$. For\u0000comparison purposes, we also report eigenfunction and spectral properties of\u0000the Hermitian diluted banded random matrix ensemble.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}