从零散资源到综合技术景观:基于推荐的检索方法

IF 2.2 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE
Chi Thang Duong , Dimitri Perica David , Ljiljana Dolamic , Alain Mermoud , Vincent Lenders , Karl Aberer
{"title":"从零散资源到综合技术景观:基于推荐的检索方法","authors":"Chi Thang Duong ,&nbsp;Dimitri Perica David ,&nbsp;Ljiljana Dolamic ,&nbsp;Alain Mermoud ,&nbsp;Vincent Lenders ,&nbsp;Karl Aberer","doi":"10.1016/j.wpi.2023.102198","DOIUrl":null,"url":null,"abstract":"<div><p>Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.</p></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"73 ","pages":"Article 102198"},"PeriodicalIF":2.2000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"From scattered sources to comprehensive technology landscape : A recommendation-based retrieval approach\",\"authors\":\"Chi Thang Duong ,&nbsp;Dimitri Perica David ,&nbsp;Ljiljana Dolamic ,&nbsp;Alain Mermoud ,&nbsp;Vincent Lenders ,&nbsp;Karl Aberer\",\"doi\":\"10.1016/j.wpi.2023.102198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.</p></div>\",\"PeriodicalId\":51794,\"journal\":{\"name\":\"World Patent Information\",\"volume\":\"73 \",\"pages\":\"Article 102198\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Patent Information\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0172219023000285\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219023000285","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 1

摘要

绘制技术版图对于市场参与者做出明智的投资决策至关重要。然而,考虑到网络上的大量数据及其随后的信息过载,手动检索信息似乎是一种无效且不完整的方法。在这项工作中,我们提出了一种基于端到端推荐的检索方法,以支持从原始Web数据中自动检索技术及其关联公司。这是一个两个任务的设置,涉及(i)从公司语料库中提取的实体的技术分类,以及(ii)基于分类技术的技术和公司检索。我们提出的框架通过利用最先进的语言模型DistilBERT来完成第一项任务。对于检索任务,我们引入了一种基于推荐的检索技术,以同时支持检索相关公司、与特定公司相关的技术和与某项技术相关的公司。为了评估这些任务,我们还构建了一个数据集,其中包括公司文档和从这些文档中提取的实体,以及公司类别和技术标签。实验表明,我们的方法能够返回4倍以上的相关公司,同时在检索技术方面优于传统的检索基线。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
From scattered sources to comprehensive technology landscape : A recommendation-based retrieval approach

Mapping the technology landscape is crucial for market actors to take informed investment decisions. However, given the large amount of data on the Web and its subsequent information overload, manually retrieving information is a seemingly ineffective and incomplete approach. In this work, we propose an end-to-end recommendation based retrieval approach to support automatic retrieval of technologies and their associated companies from raw Web data. This is a two-task setup involving (i) technology classification of entities extracted from company corpus, and (ii) technology and company retrieval based on classified technologies. Our proposed framework approaches the first task by leveraging DistilBERT which is a state-of-the-art language model. For the retrieval task, we introduce a recommendation-based retrieval technique to simultaneously support retrieving related companies, technologies related to a specific company and companies relevant to a technology. To evaluate these tasks, we also construct a data set that includes company documents and entities extracted from these documents together with company categories and technology labels. Experiments show that our approach is able to return 4 times more relevant companies while outperforming traditional retrieval baseline in retrieving technologies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
World Patent Information
World Patent Information INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
3.50
自引率
18.50%
发文量
40
期刊介绍: The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信