arXiv (Cornell University)最新文献_第9页

Feature emergence via margin maximization: case studies in algebraic tasks 通过边际最大化产生特征:代数任务中的案例研究

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07568

Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham

{"title":"Feature emergence via margin maximization: case studies in algebraic\u0000 tasks","authors":"Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham","doi":"10.48550/arxiv.2311.07568","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07568","url":null,"abstract":"Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"105 22","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Exploring the Dialogue Comprehension Ability of Large Language Models 探索大型语言模型的对话理解能力

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07194

She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun

{"title":"Exploring the Dialogue Comprehension Ability of Large Language Models","authors":"She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun","doi":"10.48550/arxiv.2311.07194","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07194","url":null,"abstract":"The recent emergence of large language models (LLMs) have attracted considerable attention. LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. Without correct comprehension of the dialogue, the model may inevitably generate incorrect responses. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum), we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest evaluated model, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average accuracy of all evaluated LLMs is only 62.8%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an accuracy improvement of 8.9% on DIAC-FactQA.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 36","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical Systems 离散动力系统的鞍节点分岔慢通过

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07242

Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang

{"title":"Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical\u0000 Systems","authors":"Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang","doi":"10.48550/arxiv.2311.07242","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07242","url":null,"abstract":"We study a discrete non-autonomous system whose autonomous counterpart (with the frozen bifurcation parameter) admits a saddle-node bifurcation, and in which the bifurcation parameter slowly changes in time and is characterized by a sweep rate constant $epsilon$. The discrete system is more appropriate for modeling realistic systems since only time series data is available. We show that in contrast to its autonomous counterpart, when the time mesh size $Delta t$ is less than the order $O(epsilon)$, there is a bifurcation delay as the bifurcation time-varying parameter is varied through the bifurcation point, and the delay is proportional to the two-thirds power of the sweep rate constant $epsilon$. This bifurcation delay is significant in various realistic systems since it allows one to take necessary action promptly before a sudden collapse or shift to different states. On the other hand, when the time mesh size $Delta t$ is larger than the order $o(epsilon)$, the dynamical behavior of the solution is dramatically changed before the bifurcation point. This behavior is not observed in the autonomous counterpart. Therefore, the dynamical behavior of the system strongly depends on the time mesh size. Finally. due to the very discrete feature of the system, there are no efficient tools for the analytical study of the system. Our approach is elementary and analytical.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 29","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers 传感二自由度抓取器的仿生抓取控制器

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07257

Lach, Luca, Lemaignan, Séverin, Ferro, Francesco, Ritter, Helge, Haschke, Robert

引用次数: 0

On the asymptotic of lottery numbers 关于彩票号码的渐近性

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07406

Sidorenko, Alexander

引用次数: 0

Towards a covariant framework for post-Newtonian expansions for radiative sources 辐射源后牛顿展开的协变框架

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07546

Hartong, Jelle, Musaeus, Jørgen

{"title":"Towards a covariant framework for post-Newtonian expansions for\u0000 radiative sources","authors":"Hartong, Jelle, Musaeus, Jørgen","doi":"10.48550/arxiv.2311.07546","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07546","url":null,"abstract":"We consider the classic problem of a compact fluid source that behaves non-relativistically and that radiates gravitational waves. The problem consists of determining the metric close to the source as well as far away from it. The non-relativistic nature of the source leads to a separation of scales resulting in an overlap region where both the $1/c$ and (multipolar) $G$-expansions are valid. Standard approaches to this problem (the Blanchet--Damour and the DIRE approach) use the harmonic gauge. We define a `post-Newtonian' class of gauges that admit a Newtonian regime in inertial coordinates. In this paper we set up a formalism to solve for the metric for any post-Newtonian gauge choice. Our methods are based on previous work on the covariant theory of non-relativistic gravity (a $1/c$-expansion of general relativity that uses post-Newton-Cartan variables). At the order of interest in the $1/c$ and $G$-expansions we split the variables into two sets: transverse and longitudinal. We show that for the transverse variables the problem can be reduced to inverting Laplacian and d'Alembertian operators on their respective domains subject to appropriate boundary conditions. The latter are regularity in the interior and asymptotic flatness with a Sommerfeld no-incoming radiation condition imposed at past null infinity. The longitudinal variables follow from the gauge choice. The full solution is then obtained by the method of matched asymptotic expansion. We show that our methods reproduce existing results in harmonic gauge to 2.5PN order.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"106 20","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On Elastic Language Models 弹性语言模型

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07204

Zhang, Chen, Wang, Benyou, Song, Dawei

{"title":"On Elastic Language Models","authors":"Zhang, Chen, Wang, Benyou, Song, Dawei","doi":"10.48550/arxiv.2311.07204","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07204","url":null,"abstract":"Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"117 50","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SL(2, $mathbb C$) quartic vertex for closed string field theory 闭弦场理论的SL(2， $mathbb C$)四次顶点

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07367

Erbin, Harold, Majumder, Suvajit

引用次数: 0

Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in (0,1)$ logistic regression Diaconis-Ylvisaker在$p/n to kappa (0,1)$逻辑回归中先验惩罚似然

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07419

Sterzinger, Philipp, Kosmidis, Ioannis

{"title":"Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in\u0000 (0,1)$ logistic regression","authors":"Sterzinger, Philipp, Kosmidis, Ioannis","doi":"10.48550/arxiv.2311.07419","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07419","url":null,"abstract":"We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $kappa in (0,1)$ of the number of observations $n$, as $n to infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $gamma sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(kappa, gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(kappa, gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"113 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification 用油漆标记重新识别的蜜蜂花斑自动测定方法的研究

arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07407

Meyers, Luke, Cordero, Josué Rodríguez, Bravo, Carlos Corrada, Noel, Fanfan, Agosto-Rivera, José, Giray, Tugrul, Mégret, Rémi

引用次数: 0