arXiv (Cornell University)最新文献

筛选
英文 中文
Feature emergence via margin maximization: case studies in algebraic tasks 通过边际最大化产生特征:代数任务中的案例研究
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07568
Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham
{"title":"Feature emergence via margin maximization: case studies in algebraic\u0000 tasks","authors":"Morwani, Depen, Edelman, Benjamin L., Oncescu, Costin-Andrei, Zhao, Rosie, Kakade, Sham","doi":"10.48550/arxiv.2311.07568","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07568","url":null,"abstract":"Understanding the internal representations learned by neural networks is a cornerstone challenge in the science of machine learning. While there have been significant recent strides in some cases towards understanding how neural networks implement specific target functions, this paper explores a complementary question -- why do networks arrive at particular computational strategies? Our inquiry focuses on the algebraic learning tasks of modular addition, sparse parities, and finite group operations. Our primary theoretical findings analytically characterize the features learned by stylized neural networks for these algebraic tasks. Notably, our main technique demonstrates how the principle of margin maximization alone can be used to fully specify the features learned by the network. Specifically, we prove that the trained networks utilize Fourier features to perform modular addition and employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups, aligning closely with the empirical observations of Nanda et al. and Chughtai et al. More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the Dialogue Comprehension Ability of Large Language Models 探索大型语言模型的对话理解能力
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07194
She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun
{"title":"Exploring the Dialogue Comprehension Ability of Large Language Models","authors":"She, Shuaijie, Huang, Shujian, Wang, Xingyun, Zhou, Yanke, Chen, Jiajun","doi":"10.48550/arxiv.2311.07194","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07194","url":null,"abstract":"The recent emergence of large language models (LLMs) have attracted considerable attention. LLMs may interact with users in the form of dialogue and generate responses following their instructions, which naturally require dialogue comprehension abilities. Without correct comprehension of the dialogue, the model may inevitably generate incorrect responses. However, dialogue comprehension is a general language ability which is hard to be evaluated directly. In this work, we propose to perform the evaluation with the help of the dialogue summarization task. Beside evaluating and analyzing the dialogue summarization performance (DIAC-Sum), we also derive factual questions from the generated summaries and use them as a more flexible measurement of dialogue comprehension (DIAC-FactQA). Our evaluation shows that, on average, 27% of the summaries generated by LLMs contain factual inconsistency. Even ChatGPT, the strongest evaluated model, has such errors in 16% of its summaries. For answering the factual questions, which is more challenging, the average accuracy of all evaluated LLMs is only 62.8%. Both results indicate serious deficiencies. Detailed analysis shows that the understanding of subject/object of the conversation is still the most challenging problem for LLMs. Furthermore, to stimulate and enhance the dialogue comprehension ability of LLMs, we propose a fine-tuning paradigm with auto-constructed multi-task data. The experimental results demonstrate that our method achieved an accuracy improvement of 8.9% on DIAC-FactQA.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical Systems 离散动力系统的鞍节点分岔慢通过
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07242
Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang
{"title":"Slow Passage through a Saddle-Node Bifurcation in Discrete Dynamical\u0000 Systems","authors":"Chu, Jay, Lin, Jun-Jie, Tsai, Je-Chiang","doi":"10.48550/arxiv.2311.07242","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07242","url":null,"abstract":"We study a discrete non-autonomous system whose autonomous counterpart (with the frozen bifurcation parameter) admits a saddle-node bifurcation, and in which the bifurcation parameter slowly changes in time and is characterized by a sweep rate constant $epsilon$. The discrete system is more appropriate for modeling realistic systems since only time series data is available. We show that in contrast to its autonomous counterpart, when the time mesh size $Delta t$ is less than the order $O(epsilon)$, there is a bifurcation delay as the bifurcation time-varying parameter is varied through the bifurcation point, and the delay is proportional to the two-thirds power of the sweep rate constant $epsilon$. This bifurcation delay is significant in various realistic systems since it allows one to take necessary action promptly before a sudden collapse or shift to different states. On the other hand, when the time mesh size $Delta t$ is larger than the order $o(epsilon)$, the dynamical behavior of the solution is dramatically changed before the bifurcation point. This behavior is not observed in the autonomous counterpart. Therefore, the dynamical behavior of the system strongly depends on the time mesh size. Finally. due to the very discrete feature of the system, there are no efficient tools for the analytical study of the system. Our approach is elementary and analytical.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers 传感二自由度抓取器的仿生抓取控制器
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07257
Lach, Luca, Lemaignan, Séverin, Ferro, Francesco, Ritter, Helge, Haschke, Robert
{"title":"Bio-Inspired Grasping Controller for Sensorized 2-DoF Grippers","authors":"Lach, Luca, Lemaignan, Séverin, Ferro, Francesco, Ritter, Helge, Haschke, Robert","doi":"10.48550/arxiv.2311.07257","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07257","url":null,"abstract":"We present a holistic grasping controller, combining free-space position control and in-contact force-control for reliable grasping given uncertain object pose estimates. Employing tactile fingertip sensors, undesired object displacement during grasping is minimized by pausing the finger closing motion for individual joints on first contact until force-closure is established. While holding an object, the controller is compliant with external forces to avoid high internal object forces and prevent object damage. Gravity as an external force is explicitly considered and compensated for, thus preventing gravity-induced object drift. We evaluate the controller in two experiments on the TIAGo robot and its parallel-jaw gripper proving the effectiveness of the approach for robust grasping and minimizing object displacement. In a series of ablation studies, we demonstrate the utility of the individual controller components.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the asymptotic of lottery numbers 关于彩票号码的渐近性
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07406
Sidorenko, Alexander
{"title":"On the asymptotic of lottery numbers","authors":"Sidorenko, Alexander","doi":"10.48550/arxiv.2311.07406","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07406","url":null,"abstract":"Let $L(n,k,r,p)$ denote the minimum number of $k$-subsets of an $n$-set such that all the $binom{n}{p}$ $p$-subsets are intersected by one of them in at least $r$ elements. The case $p=r$ corresponds to the covering numbers, while the case $k=r$ corresponds to the Tur'an numbers. In both cases, there exists a limit of $L(n,k,r,p) / binom{n}{r}$ as $ntoinfty$. We prove the existence of this limit in the general case.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136352982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a covariant framework for post-Newtonian expansions for radiative sources 辐射源后牛顿展开的协变框架
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07546
Hartong, Jelle, Musaeus, Jørgen
{"title":"Towards a covariant framework for post-Newtonian expansions for\u0000 radiative sources","authors":"Hartong, Jelle, Musaeus, Jørgen","doi":"10.48550/arxiv.2311.07546","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07546","url":null,"abstract":"We consider the classic problem of a compact fluid source that behaves non-relativistically and that radiates gravitational waves. The problem consists of determining the metric close to the source as well as far away from it. The non-relativistic nature of the source leads to a separation of scales resulting in an overlap region where both the $1/c$ and (multipolar) $G$-expansions are valid. Standard approaches to this problem (the Blanchet--Damour and the DIRE approach) use the harmonic gauge. We define a `post-Newtonian' class of gauges that admit a Newtonian regime in inertial coordinates. In this paper we set up a formalism to solve for the metric for any post-Newtonian gauge choice. Our methods are based on previous work on the covariant theory of non-relativistic gravity (a $1/c$-expansion of general relativity that uses post-Newton-Cartan variables). At the order of interest in the $1/c$ and $G$-expansions we split the variables into two sets: transverse and longitudinal. We show that for the transverse variables the problem can be reduced to inverting Laplacian and d'Alembertian operators on their respective domains subject to appropriate boundary conditions. The latter are regularity in the interior and asymptotic flatness with a Sommerfeld no-incoming radiation condition imposed at past null infinity. The longitudinal variables follow from the gauge choice. The full solution is then obtained by the method of matched asymptotic expansion. We show that our methods reproduce existing results in harmonic gauge to 2.5PN order.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On Elastic Language Models 弹性语言模型
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07204
Zhang, Chen, Wang, Benyou, Song, Dawei
{"title":"On Elastic Language Models","authors":"Zhang, Chen, Wang, Benyou, Song, Dawei","doi":"10.48550/arxiv.2311.07204","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07204","url":null,"abstract":"Large-scale pretrained language models have achieved compelling performance in a wide range of language understanding and information retrieval tasks. Knowledge distillation offers an opportunity to compress a large language model to a small one, in order to reach a reasonable latency-performance tradeoff. However, for scenarios where the number of requests (e.g., queries submitted to a search engine) is highly variant, the static tradeoff attained by the compressed language model might not always fit. Once a model is assigned with a static tradeoff, it could be inadequate in that the latency is too high when the number of requests is large or the performance is too low when the number of requests is small. To this end, we propose an elastic language model (ElasticLM) that elastically adjusts the tradeoff according to the request stream. The basic idea is to introduce a compute elasticity to the compressed language model, so that the tradeoff could vary on-the-fly along scalable and controllable compute. Specifically, we impose an elastic structure to enable ElasticLM with compute elasticity and design an elastic optimization to learn ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic schedule. Considering the specificity of information retrieval, we adapt ElasticLM to dense retrieval and reranking and present ElasticDenser and ElasticRanker respectively. Offline evaluation is conducted on a language understanding benchmark GLUE; and several information retrieval tasks including Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM along with ElasticDenser and ElasticRanker can perform correctly and competitively compared with an array of static baselines. Furthermore, online simulation with concurrency is also carried out. The results demonstrate that ElasticLM can provide elastic tradeoffs with respect to varying request stream.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SL(2, $mathbb C$) quartic vertex for closed string field theory 闭弦场理论的SL(2, $mathbb C$)四次顶点
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07367
Erbin, Harold, Majumder, Suvajit
{"title":"SL(2, $mathbb C$) quartic vertex for closed string field theory","authors":"Erbin, Harold, Majumder, Suvajit","doi":"10.48550/arxiv.2311.07367","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07367","url":null,"abstract":"We construct the $mathrm{SL}(2, mathbb C)$ quartic vertex with a generic stub parameter for the bosonic closed string field theory by characterizing the vertex region in the moduli space of 4-punctured sphere, and providing the necessary and sufficient constraints for the local coordinate maps. While $mathrm{SL}(2, mathbb C)$ vertices are not known to have a nice geometric recursive construction like the minimal area or hyperbolic vertices, they can be studied analytically which makes them more convenient for simple computations. In particular, we obtain exact formulas for the parametrization and volume of the vertex region as a function of the stub parameter. The main objective of having an explicit quartic vertex is to later study its decomposition using auxiliary fields.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in (0,1)$ logistic regression Diaconis-Ylvisaker在$p/n to kappa (0,1)$逻辑回归中先验惩罚似然
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07419
Sterzinger, Philipp, Kosmidis, Ioannis
{"title":"Diaconis-Ylvisaker prior penalized likelihood for $p/n to kappa in\u0000 (0,1)$ logistic regression","authors":"Sterzinger, Philipp, Kosmidis, Ioannis","doi":"10.48550/arxiv.2311.07419","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07419","url":null,"abstract":"We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $kappa in (0,1)$ of the number of observations $n$, as $n to infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $gamma sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(kappa, gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(kappa, gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification 用油漆标记重新识别的蜜蜂花斑自动测定方法的研究
arXiv (Cornell University) Pub Date : 2023-11-13 DOI: 10.48550/arxiv.2311.07407
Meyers, Luke, Cordero, Josué Rodríguez, Bravo, Carlos Corrada, Noel, Fanfan, Agosto-Rivera, José, Giray, Tugrul, Mégret, Rémi
{"title":"Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking\u0000 Re-Identification","authors":"Meyers, Luke, Cordero, Josué Rodríguez, Bravo, Carlos Corrada, Noel, Fanfan, Agosto-Rivera, José, Giray, Tugrul, Mégret, Rémi","doi":"10.48550/arxiv.2311.07407","DOIUrl":"https://doi.org/10.48550/arxiv.2311.07407","url":null,"abstract":"In this paper, we show that paint markings are a feasible approach to automatize the analysis of behavioral assays involving honey bees in the field where marking has to be as lightweight as possible. We contribute a novel dataset for bees re-identification with paint-markings with 4392 images and 27 identities. Contrastive learning with a ResNet backbone and triplet loss led to identity representation features with almost perfect recognition in closed setting where identities are known in advance. Diverse experiments evaluate the capability to generalize to separate IDs, and show the impact of using different body parts for identification, such as using the unmarked abdomen only. In addition, we show the potential to fully automate the visit detection and provide preliminary results of compute time for future real-time deployment in the field on an edge device.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136353149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信