用词嵌入的加权平均值测量文档相似度

IF 1.7 1区历史学 Q1 ECONOMICS

Explorations in Economic History Pub Date : 2023-01-01 Epub Date: 2022-12-15 DOI:10.1016/j.eeh.2022.101494

Bryan Seegmiller , Dimitris Papanikolaou , Lawrence D.W. Schmidt

{"title":"用词嵌入的加权平均值测量文档相似度","authors":"Bryan Seegmiller , Dimitris Papanikolaou , Lawrence D.W. Schmidt","doi":"10.1016/j.eeh.2022.101494","DOIUrl":null,"url":null,"abstract":"<div><p>We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.</p></div>","PeriodicalId":47413,"journal":{"name":"Explorations in Economic History","volume":"87 ","pages":"Article 101494"},"PeriodicalIF":1.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Measuring document similarity with weighted averages of word embeddings\",\"authors\":\"Bryan Seegmiller , Dimitris Papanikolaou , Lawrence D.W. Schmidt\",\"doi\":\"10.1016/j.eeh.2022.101494\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.</p></div>\",\"PeriodicalId\":47413,\"journal\":{\"name\":\"Explorations in Economic History\",\"volume\":\"87 \",\"pages\":\"Article 101494\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Explorations in Economic History\",\"FirstCategoryId\":\"98\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0014498322000729\",\"RegionNum\":1,\"RegionCategory\":\"历史学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2022/12/15 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"ECONOMICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Explorations in Economic History","FirstCategoryId":"98","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0014498322000729","RegionNum":1,"RegionCategory":"历史学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/12/15 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}

引用次数: 1

摘要

我们详细介绍了一种方法，用于估计两个文档之间的文本相似性，同时考虑到两个不同的单词可能具有相似的含义。我们通过将职业任务和行业输出描述与专利文本中描述的相关技术在文本上联系起来，说明了该方法在促进具有不同格式和词汇表的文档之间的比较方面的有用性;我们还研究了由此产生的文件相似度度量的经济应用。在最后一个应用程序中，我们通过显示职业任务描述之间的成对文本相似性强烈地预测了给定工人从一种职业转换到另一种职业的概率，证明了该方法相对于比较同一领域内文档的替代方法也很有效。最后，对该方法的其他潜在用途和实施指导提出了一些建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Measuring document similarity with weighted averages of word embeddings

We detail a methodology for estimating the textual similarity between two documents while accounting for the possibility that two different words can have a similar meaning. We illustrate the method’s usefulness in facilitating comparisons between documents with very different formats and vocabularies by textually linking occupation task and industry output descriptions with related technologies as described in patent texts; we also examine economic applications of the resultant document similarity measures. In a final application we demonstrate that the method also works well relative to alternatives for comparing documents within the same domain by showing that pairwise textual similarity between occupations’ task descriptions strongly predicts the probability that a given worker will transition from one occupation to another. Finally, we offer some suggestions on other potential uses and guidance in implementing the method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Explorations in Economic History Multiple-

CiteScore

2.50

自引率

8.70%

发文量

期刊介绍： Explorations in Economic History provides broad coverage of the application of economic analysis to historical episodes. The journal has a tradition of innovative applications of theory and quantitative techniques, and it explores all aspects of economic change, all historical periods, all geographical locations, and all political and social systems. The journal includes papers by economists, economic historians, demographers, geographers, and sociologists. Explorations in Economic History is the only journal where you will find "Essays in Exploration." This unique department alerts economic historians to the potential in a new area of research, surveying the recent literature and then identifying the most promising issues to pursue.