可能的规范错误下逻辑回归的稀疏网络渐近学

IF 6.6 1区 经济学 Q1 ECONOMICS
Econometrica Pub Date : 2024-11-21 DOI:10.3982/ECTA19051
Bryan S. Graham
{"title":"可能的规范错误下逻辑回归的稀疏网络渐近学","authors":"Bryan S. Graham","doi":"10.3982/ECTA19051","DOIUrl":null,"url":null,"abstract":"<div>\n <p>Consider a bipartite network where <i>N</i> consumers choose to buy or not to buy <i>M</i> different products. This paper considers the properties of the logit fit of the <i>N</i> × <i>M</i> array of “<i>i</i>-buys-<i>j</i>” purchase decisions, <span></span><math></math>, onto a vector of known functions of consumer and product attributes under asymptotic sequences where (i) both <i>N</i> and <i>M</i> grow large, (ii) the average number of products purchased per consumer is finite in the limit, (iii) there exists dependence across elements in the same row or same column of <b>Y</b> (i.e., dyadic dependence), and (iv) the true conditional probability of making a purchase may, or may not, take the assumed logit form. Condition (ii) implies that the limiting network of purchases is <i>sparse</i>: only a vanishing fraction of all possible purchases are actually made. Under sparse network asymptotics, I show that the parameter indexing the logit approximation solves a particular Kullback–Leibler Information Criterion (KLIC) minimization problem (defined with respect to a certain Poisson population). This finding provides a simple characterization of the logit pseudo-true parameter under general misspecification (analogous to a (mean squared error (MSE) minimizing) linear predictor approximation of a general conditional expectation function (CEF)). With respect to sampling theory, sparseness implies that the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit pseudo composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the data set in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and i.i.d. data. Simulation results suggest that sparse network asymptotics better approximate the finite network distribution of the logit estimator. A short empirical illustration, and additional calibrated Monte Carlo experiments, further illustrate the main theoretical ideas.</p>\n </div>","PeriodicalId":50556,"journal":{"name":"Econometrica","volume":"92 6","pages":"1837-1868"},"PeriodicalIF":6.6000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA19051","citationCount":"0","resultStr":"{\"title\":\"Sparse Network Asymptotics for Logistic Regression Under Possible Misspecification\",\"authors\":\"Bryan S. Graham\",\"doi\":\"10.3982/ECTA19051\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n <p>Consider a bipartite network where <i>N</i> consumers choose to buy or not to buy <i>M</i> different products. This paper considers the properties of the logit fit of the <i>N</i> × <i>M</i> array of “<i>i</i>-buys-<i>j</i>” purchase decisions, <span></span><math></math>, onto a vector of known functions of consumer and product attributes under asymptotic sequences where (i) both <i>N</i> and <i>M</i> grow large, (ii) the average number of products purchased per consumer is finite in the limit, (iii) there exists dependence across elements in the same row or same column of <b>Y</b> (i.e., dyadic dependence), and (iv) the true conditional probability of making a purchase may, or may not, take the assumed logit form. Condition (ii) implies that the limiting network of purchases is <i>sparse</i>: only a vanishing fraction of all possible purchases are actually made. Under sparse network asymptotics, I show that the parameter indexing the logit approximation solves a particular Kullback–Leibler Information Criterion (KLIC) minimization problem (defined with respect to a certain Poisson population). This finding provides a simple characterization of the logit pseudo-true parameter under general misspecification (analogous to a (mean squared error (MSE) minimizing) linear predictor approximation of a general conditional expectation function (CEF)). With respect to sampling theory, sparseness implies that the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit pseudo composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the data set in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and i.i.d. data. Simulation results suggest that sparse network asymptotics better approximate the finite network distribution of the logit estimator. A short empirical illustration, and additional calibrated Monte Carlo experiments, further illustrate the main theoretical ideas.</p>\\n </div>\",\"PeriodicalId\":50556,\"journal\":{\"name\":\"Econometrica\",\"volume\":\"92 6\",\"pages\":\"1837-1868\"},\"PeriodicalIF\":6.6000,\"publicationDate\":\"2024-11-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.3982/ECTA19051\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Econometrica\",\"FirstCategoryId\":\"96\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.3982/ECTA19051\",\"RegionNum\":1,\"RegionCategory\":\"经济学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Econometrica","FirstCategoryId":"96","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.3982/ECTA19051","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

摘要

考虑一个由 N 个消费者选择购买或不购买 M 种不同产品的双向网络。本文考虑了 N × M 阵列的 "i-buys-j "购买决策的 logit 拟合特性,即在以下渐近序列下,将 "i-buys-j "购买决策的 logit 拟合到消费者和产品属性的已知函数向量上:(i) N 和 M 都增长得很大;(ii) 每个消费者购买产品的平均数量在极限情况下是有限的;(iii) Y 的同一行或同列中的元素之间存在依赖关系(即、(iv)购买的真实条件概率可能是,也可能不是假定的 logit 形式。条件(ii)意味着购买的极限网络是稀疏的:在所有可能的购买行为中,只有极少部分实际进行了购买。在稀疏网络渐近论下,我证明了对数近似的索引参数能解决一个特定的库尔巴克-莱伯勒信息准则(KLIC)最小化问题(针对特定泊松人口定义)。这一发现为对数伪真参数提供了一个简单的特征描述,即在一般误设情况下的对数伪真参数(类似于一般条件期望函数(CEF)的线性预测近似(均方误差(MSE)最小化))。就抽样理论而言,稀疏性意味着 logit 伪综合对数似然得分的扩展 Hoeffding 型方差分解中的第一项和最后一项是等阶的。相反,在密集网络渐近学下,最后一项在渐近学上可以忽略不计。利用三角形阵列的马氏中心极限定理(CLT)证明了逻辑回归系数的渐近正态性。与密集情况不同,这里得出的正态性结果在网络图元退化的情况下也成立。与此相关的是,当手头的数据集 "恰好 "不存在二元依赖关系时,它与最近得出的关于罕见事件和 i.i.d. 数据的逻辑回归行为的结果相吻合。模拟结果表明,稀疏网络渐近线能更好地逼近对数估计器的有限网络分布。一个简短的经验说明和额外的校准蒙特卡罗实验进一步说明了主要的理论观点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sparse Network Asymptotics for Logistic Regression Under Possible Misspecification

Consider a bipartite network where N consumers choose to buy or not to buy M different products. This paper considers the properties of the logit fit of the N × M array of “i-buys-j” purchase decisions, , onto a vector of known functions of consumer and product attributes under asymptotic sequences where (i) both N and M grow large, (ii) the average number of products purchased per consumer is finite in the limit, (iii) there exists dependence across elements in the same row or same column of Y (i.e., dyadic dependence), and (iv) the true conditional probability of making a purchase may, or may not, take the assumed logit form. Condition (ii) implies that the limiting network of purchases is sparse: only a vanishing fraction of all possible purchases are actually made. Under sparse network asymptotics, I show that the parameter indexing the logit approximation solves a particular Kullback–Leibler Information Criterion (KLIC) minimization problem (defined with respect to a certain Poisson population). This finding provides a simple characterization of the logit pseudo-true parameter under general misspecification (analogous to a (mean squared error (MSE) minimizing) linear predictor approximation of a general conditional expectation function (CEF)). With respect to sampling theory, sparseness implies that the first and last terms in an extended Hoeffding-type variance decomposition of the score of the logit pseudo composite log-likelihood are of equal order. In contrast, under dense network asymptotics, the last term is asymptotically negligible. Asymptotic normality of the logistic regression coefficients is shown using a martingale central limit theorem (CLT) for triangular arrays. Unlike in the dense case, the normality result derived here also holds under degeneracy of the network graphon. Relatedly, when there “happens to be” no dyadic dependence in the data set in hand, it specializes to recently derived results on the behavior of logistic regression with rare events and i.i.d. data. Simulation results suggest that sparse network asymptotics better approximate the finite network distribution of the logit estimator. A short empirical illustration, and additional calibrated Monte Carlo experiments, further illustrate the main theoretical ideas.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Econometrica
Econometrica 社会科学-数学跨学科应用
CiteScore
11.00
自引率
3.30%
发文量
75
审稿时长
6-12 weeks
期刊介绍: Econometrica publishes original articles in all branches of economics - theoretical and empirical, abstract and applied, providing wide-ranging coverage across the subject area. It promotes studies that aim at the unification of the theoretical-quantitative and the empirical-quantitative approach to economic problems and that are penetrated by constructive and rigorous thinking. It explores a unique range of topics each year - from the frontier of theoretical developments in many new and important areas, to research on current and applied economic problems, to methodologically innovative, theoretical and applied studies in econometrics. Econometrica maintains a long tradition that submitted articles are refereed carefully and that detailed and thoughtful referee reports are provided to the author as an aid to scientific research, thus ensuring the high calibre of papers found in Econometrica. An international board of editors, together with the referees it has selected, has succeeded in substantially reducing editorial turnaround time, thereby encouraging submissions of the highest quality. We strongly encourage recent Ph. D. graduates to submit their work to Econometrica. Our policy is to take into account the fact that recent graduates are less experienced in the process of writing and submitting papers.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信