Exploring motivations for algorithm mention in the domain of natural language processing: A deep learning approach

IF 3.4 2区管理学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Journal of Informetrics Pub Date : 2024-06-08 DOI:10.1016/j.joi.2024.101550

Yuzhuo Wang , Yi Xiang , Chengzhi Zhang

{"title":"Exploring motivations for algorithm mention in the domain of natural language processing: A deep learning approach","authors":"Yuzhuo Wang , Yi Xiang , Chengzhi Zhang","doi":"10.1016/j.joi.2024.101550","DOIUrl":null,"url":null,"abstract":"<div><p>With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"18 4","pages":"Article 101550"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000634","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.

查看原文本刊更多论文

探索自然语言处理领域提及算法的动机：深度学习方法

随着科学研究第四范式的形成，算法在科学研究中变得越来越重要。在学术论文中，学者们可能出于各种动机提及算法，使用、比较或改进算法来解决复杂的研究任务。识别这些动机有助于学者发现算法之间的关系，进一步评估算法的作用和价值。因此，本文以自然语言处理（NLP）领域为例，提出了一种完整的方法，在句子层面对提及算法的动机进行识别、分布和演变。具体来说，我们利用人工标注和机器学习方法，识别论文全文中的算法实体和句子，通过预训练模型和数据增强技术对提及算法的动机进行分类，最后分析动机的分布和演变。结果表明，使用增强数据训练的深度学习模型在分类任务中的表现优于传统的机器学习模型。在学术论文中，半数以上的句子显示直接使用了算法，而改进算法的动机比例最低，且动机的多样性随着时间的推移不断增加。就具体算法而言，语法算法的 "描述 "动机较多，而机器学习算法类别中 "使用 "动机较多。随着时间的推移，不同算法的 "使用 "动机逐渐取代了 "描述 "动机，动机类型的数量明显减少。我们的研究探索了作者提及算法实体的动机的识别、分布和演变，这可以为未来利用动机进行算法关系识别和影响力评估提供依据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Informetrics Social Sciences-Library and Information Sciences

CiteScore

6.40

自引率

16.20%

发文量

期刊介绍： Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.