用词嵌入的加权平均值测量文档相似度

IF 2.6 1区 历史学 Q1 ECONOMICS
Bryan Seegmiller , Dimitris Papanikolaou , Lawrence D.W. Schmidt
{"title":"用词嵌入的加权平均值测量文档相似度","authors":"Bryan Seegmiller ,&nbsp;Dimitris Papanikolaou ,&nbsp;Lawrence D.W. Schmidt","doi":"10.1016/j.eeh.2022.101494","DOIUrl":null,"url":null,"abstract":"<div><p>We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.</p></div>","PeriodicalId":47413,"journal":{"name":"Explorations in Economic History","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Measuring document similarity with weighted averages of word embeddings\",\"authors\":\"Bryan Seegmiller ,&nbsp;Dimitris Papanikolaou ,&nbsp;Lawrence D.W. Schmidt\",\"doi\":\"10.1016/j.eeh.2022.101494\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.</p></div>\",\"PeriodicalId\":47413,\"journal\":{\"name\":\"Explorations in Economic History\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Explorations in Economic History\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0014498322000729\",\"RegionNum\":1,\"RegionCategory\":\"历史学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Explorations in Economic History","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0014498322000729","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 1

摘要

我们详细介绍了一种方法,用于估计两个文档之间的文本相似性,同时考虑到两个不同的单词可能具有相似的含义。我们通过将职业任务和行业输出描述与专利文本中描述的相关技术在文本上联系起来,说明了该方法在促进具有不同格式和词汇表的文档之间的比较方面的有用性;我们还研究了由此产生的文件相似度度量的经济应用。在最后一个应用程序中,我们通过显示职业任务描述之间的成对文本相似性强烈地预测了给定工人从一种职业转换到另一种职业的概率,证明了该方法相对于比较同一领域内文档的替代方法也很有效。最后,对该方法的其他潜在用途和实施指导提出了一些建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Measuring document similarity with weighted averages of word embeddings

We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.50
自引率
8.70%
发文量
27
期刊介绍: Explorations in Economic History provides broad coverage of the application of economic analysis to historical episodes. The journal has a tradition of innovative applications of theory and quantitative techniques, and it explores all aspects of economic change, all historical periods, all geographical locations, and all political and social systems. The journal includes papers by economists, economic historians, demographers, geographers, and sociologists. Explorations in Economic History is the only journal where you will find "Essays in Exploration." This unique department alerts economic historians to the potential in a new area of research, surveying the recent literature and then identifying the most promising issues to pursue.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信