Identifying multidisciplinary problems from scientific publications based on a text generation method

IF 1.5 3区管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

Journal of Data and Information Science Pub Date : 2024-07-25 DOI:10.2478/jdis-2024-0021

Ziyan Xu, Hongqi Han, Linna Li, Junsheng Zhang, Zexu Zhou

{"title":"Identifying multidisciplinary problems from scientific publications based on a text generation method","authors":"Ziyan Xu, Hongqi Han, Linna Li, Junsheng Zhang, Zexu Zhou","doi":"10.2478/jdis-2024-0021","DOIUrl":null,"url":null,"abstract":"Purpose A text generation based multidisciplinary problem identification method is proposed, which does not rely on a large amount of data annotation. Design/methodology/approach The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique; second, it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model; third, it extracts problem phrases from generated titles according to regular expression rules; fourth, it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm; finally, it identifies multidisciplinary problems based on the disciplinary labels of papers. Findings Experiments in the “Carbon Peaking and Carbon Neutrality” field show that the proposed method can effectively identify multidisciplinary research problems. The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field. Research limitations It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness. Practical implications Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments, fund valuable multidisciplinary problems for research management authorities, and borrow ideas from other disciplines for researchers. Originality/value This approach proposes a novel multidisciplinary problem identification method based on text generation, which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"39 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Science","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.2478/jdis-2024-0021","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose A text generation based multidisciplinary problem identification method is proposed, which does not rely on a large amount of data annotation. Design/methodology/approach The proposed method first identifies the research objective types and disciplinary labels of papers using a text classification technique; second, it generates abstractive titles for each paper based on abstract and research objective types using a generative pre-trained language model; third, it extracts problem phrases from generated titles according to regular expression rules; fourth, it creates problem relation networks and identifies the same problems by exploiting a weighted community detection algorithm; finally, it identifies multidisciplinary problems based on the disciplinary labels of papers. Findings Experiments in the “Carbon Peaking and Carbon Neutrality” field show that the proposed method can effectively identify multidisciplinary research problems. The disciplinary distribution of the identified problems is consistent with our understanding of multidisciplinary collaboration in the field. Research limitations It is necessary to use the proposed method in other multidisciplinary fields to validate its effectiveness. Practical implications Multidisciplinary problem identification helps to gather multidisciplinary forces to solve complex real-world problems for the governments, fund valuable multidisciplinary problems for research management authorities, and borrow ideas from other disciplines for researchers. Originality/value This approach proposes a novel multidisciplinary problem identification method based on text generation, which identifies multidisciplinary problems based on generative abstractive titles of papers without data annotation required by standard sequence labeling techniques.

查看原文本刊更多论文

基于文本生成方法从科学出版物中发现多学科问题

目的提出一种基于文本生成的多学科问题识别方法，该方法不依赖大量数据注释。设计/方法/途径该方法首先利用文本分类技术识别论文的研究目标类型和学科标签；其次，根据论文摘要和研究目标类型，利用生成式预训练语言模型为每篇论文生成抽象化标题；第三，根据正则表达式规则从生成的标题中提取问题短语；第四，利用加权社群检测算法创建问题关系网络并识别相同问题；最后，根据论文的学科标签识别多学科问题。研究结果在 "碳峰值和碳中和 "领域的实验表明，所提出的方法能有效识别多学科研究问题。所发现问题的学科分布与我们对该领域多学科合作的理解是一致的。研究局限有必要在其他多学科领域使用所提出的方法，以验证其有效性。实践意义多学科问题识别有助于聚集多学科力量，为政府解决复杂的现实问题，为科研管理部门资助有价值的多学科问题，为科研人员借鉴其他学科的思路。独创性/价值本方法提出了一种基于文本生成的新型多学科问题识别方法，该方法基于论文的生成性抽象标题识别多学科问题，无需标准序列标注技术所需的数据注释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Data and Information Science INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.50

自引率

6.70%

发文量

495

期刊介绍： JDIS devotes itself to the study and application of the theories, methods, techniques, services, infrastructural facilities using big data to support knowledge discovery for decision & policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. The special effort is on the knowledge discovery to detect and predict structures, trends, behaviors, relations, evolutions and disruptions in research, innovation, business, politics, security, media and communications, and social development, where the big data may include metadata or full content data, text or non-textural data, structured or non-structural data, domain specific or cross-domain data, and dynamic or interactive data. The main areas of interest are: (1) New theories, methods, and techniques of big data based data mining, knowledge discovery, and informatics, including but not limited to scientometrics, communication analysis, social network analysis, tech & industry analysis, competitive intelligence, knowledge mapping, evidence based policy analysis, and predictive analysis. (2) New methods, architectures, and facilities to develop or improve knowledge infrastructure capable to support knowledge organization and sophisticated analytics, including but not limited to ontology construction, knowledge organization, semantic linked data, knowledge integration and fusion, semantic retrieval, domain specific knowledge infrastructure, and semantic sciences. (3) New mechanisms, methods, and tools to embed knowledge analytics and knowledge discovery into actual operation, service, or managerial processes, including but not limited to knowledge assisted scientific discovery, data mining driven intelligent workflows in learning, communications, and management. Specific topic areas may include: Knowledge organization Knowledge discovery and data mining Knowledge integration and fusion Semantic Web metrics Scientometrics Analytic and diagnostic informetrics Competitive intelligence Predictive analysis Social network analysis and metrics Semantic and interactively analytic retrieval Evidence-based policy analysis Intelligent knowledge production Knowledge-driven workflow management and decision-making Knowledge-driven collaboration and its management Domain knowledge infrastructure with knowledge fusion and analytics Development of data and information services