{"title":"RD2Bench: Toward Data-Centric Automatic R&D","authors":"Haotian Chen, Xinjie Shen, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian","doi":"arxiv-2404.11276","DOIUrl":null,"url":null,"abstract":"The progress of humanity is driven by those successful discoveries\naccompanied by countless failed experiments. Researchers often seek the\npotential research directions by reading and then verifying them through\nexperiments. The process imposes a significant burden on researchers. In the\npast decade, the data-driven black-box deep learning method demonstrates its\neffectiveness in a wide range of real-world scenarios, which exacerbates the\nexperimental burden of researchers and thus renders the potential successful\ndiscoveries veiled. Therefore, automating such a research and development (R&D)\nprocess is an urgent need. In this paper, we serve as the first effort to\nformalize the goal by proposing a Real-world Data-centric automatic R&D\nBenchmark, namely RD2Bench. RD2Bench benchmarks all the operations in\ndata-centric automatic R&D (D-CARD) as a whole to navigate future work toward\nour goal directly. We focuses on evaluating the interaction and synergistic\neffects of various model capabilities and aiding to select the well-performed\ntrustworthy models. Although RD2Bench is very challenging to the\nstate-of-the-art (SOTA) large language model (LLM) named GPT-4, indicating\nample research opportunities and more research efforts, LLMs possess promising\npotential to bring more significant development to D-CARD: They are able to\nimplement some simple methods without adopting any additional techniques. We\nappeal to future work to take developing techniques for tackling automatic R&D\ninto consideration, thus bringing the opportunities of the potential\nrevolutionary upgrade to human productivity.","PeriodicalId":501372,"journal":{"name":"arXiv - QuantFin - General Finance","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - General Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.11276","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The progress of humanity is driven by those successful discoveries
accompanied by countless failed experiments. Researchers often seek the
potential research directions by reading and then verifying them through
experiments. The process imposes a significant burden on researchers. In the
past decade, the data-driven black-box deep learning method demonstrates its
effectiveness in a wide range of real-world scenarios, which exacerbates the
experimental burden of researchers and thus renders the potential successful
discoveries veiled. Therefore, automating such a research and development (R&D)
process is an urgent need. In this paper, we serve as the first effort to
formalize the goal by proposing a Real-world Data-centric automatic R&D
Benchmark, namely RD2Bench. RD2Bench benchmarks all the operations in
data-centric automatic R&D (D-CARD) as a whole to navigate future work toward
our goal directly. We focuses on evaluating the interaction and synergistic
effects of various model capabilities and aiding to select the well-performed
trustworthy models. Although RD2Bench is very challenging to the
state-of-the-art (SOTA) large language model (LLM) named GPT-4, indicating
ample research opportunities and more research efforts, LLMs possess promising
potential to bring more significant development to D-CARD: They are able to
implement some simple methods without adopting any additional techniques. We
appeal to future work to take developing techniques for tackling automatic R&D
into consideration, thus bringing the opportunities of the potential
revolutionary upgrade to human productivity.