A dataset of venture capitalist types in China (1978-2021): A machine-human hybrid approach.

IF 5.8 2区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Jin Chen, Ruining Cao, Yifei Song, Anan Hu, Ying Ding
{"title":"A dataset of venture capitalist types in China (1978-2021): A machine-human hybrid approach.","authors":"Jin Chen, Ruining Cao, Yifei Song, Anan Hu, Ying Ding","doi":"10.1038/s41597-024-04108-z","DOIUrl":null,"url":null,"abstract":"<p><p>Despite escalating interest in distinguishing among various types of venture capitalists (VCs) and their roles in shaping entrepreneurship and innovation, such research remains sparse in the world's second-largest VC market, i.e., China. To address this important gap, we have devised a machine-human hybrid approach to perform the classification task for VC types. Specifically, we have compiled a list of 49,187 VCs that made investments in China before 2021 from CVSource database, collected VC ownership information from other public sources, developed machine-learning algorithms to predict VC types, and used human coders when machine-learning failed to produce a prediction. Utilizing this hybrid approach, we have classified VCs into one of the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university-affiliated VC), and PenVC (pension-fund-affiliated VC). We not only provide the most up-to-date database for VC types in the Chinese setting but also demonstrate how to leverage machine-learning algorithms to devise a transparent coding approach for VC-type classifications.</p>","PeriodicalId":21597,"journal":{"name":"Scientific Data","volume":"11 1","pages":"1255"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11579325/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Data","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41597-024-04108-z","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

Despite escalating interest in distinguishing among various types of venture capitalists (VCs) and their roles in shaping entrepreneurship and innovation, such research remains sparse in the world's second-largest VC market, i.e., China. To address this important gap, we have devised a machine-human hybrid approach to perform the classification task for VC types. Specifically, we have compiled a list of 49,187 VCs that made investments in China before 2021 from CVSource database, collected VC ownership information from other public sources, developed machine-learning algorithms to predict VC types, and used human coders when machine-learning failed to produce a prediction. Utilizing this hybrid approach, we have classified VCs into one of the following types: GVC (public agency-affiliated, state-owned enterprise-affiliated), CVC (corporate VC), IVC (independent VC), BVC (bank-affiliated VC), FVC (financial/non-bank-affiliated VC), UVC (university-affiliated VC), and PenVC (pension-fund-affiliated VC). We not only provide the most up-to-date database for VC types in the Chinese setting but also demonstrate how to leverage machine-learning algorithms to devise a transparent coding approach for VC-type classifications.

中国风险投资者类型数据集(1978-2021 年):机器-人工混合方法
尽管人们对区分各种类型的风险投资人(VC)及其在塑造创业和创新中的作用的兴趣与日俱增,但在全球第二大风险投资市场--中国,此类研究仍然十分稀少。为了填补这一重要空白,我们设计了一种机器-人工混合方法来完成风险投资类型的分类任务。具体来说,我们从 CVSource 数据库中整理了一份在 2021 年之前在中国进行投资的 49,187 家风险投资公司的名单,并从其他公开来源收集了风险投资公司的所有权信息,开发了机器学习算法来预测风险投资公司的类型,并在机器学习预测失败时使用人工编码员进行预测。利用这种混合方法,我们将风险投资分为以下类型之一:GVC(隶属于公共机构的风险投资、隶属于国有企业的风险投资)、CVC(隶属于企业的风险投资)、IVC(隶属于独立风险投资)、BVC(隶属于银行的风险投资)、FVC(隶属于金融/非银行的风险投资)、UVC(隶属于大学的风险投资)和 PenVC(隶属于养老基金的风险投资)。我们不仅为中国的风险投资类型提供了最新的数据库,还展示了如何利用机器学习算法为风险投资类型分类设计一种透明的编码方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Scientific Data
Scientific Data Social Sciences-Education
CiteScore
11.20
自引率
4.10%
发文量
689
审稿时长
16 weeks
期刊介绍: Scientific Data is an open-access journal focused on data, publishing descriptions of research datasets and articles on data sharing across natural sciences, medicine, engineering, and social sciences. Its goal is to enhance the sharing and reuse of scientific data, encourage broader data sharing, and acknowledge those who share their data. The journal primarily publishes Data Descriptors, which offer detailed descriptions of research datasets, including data collection methods and technical analyses validating data quality. These descriptors aim to facilitate data reuse rather than testing hypotheses or presenting new interpretations, methods, or in-depth analyses.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信