{"title":"探索自然语言处理领域提及算法的动机:深度学习方法","authors":"Yuzhuo Wang , Yi Xiang , Chengzhi Zhang","doi":"10.1016/j.joi.2024.101550","DOIUrl":null,"url":null,"abstract":"<div><p>With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.</p></div>","PeriodicalId":48662,"journal":{"name":"Journal of Informetrics","volume":"18 4","pages":"Article 101550"},"PeriodicalIF":3.4000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring motivations for algorithm mention in the domain of natural language processing: A deep learning approach\",\"authors\":\"Yuzhuo Wang , Yi Xiang , Chengzhi Zhang\",\"doi\":\"10.1016/j.joi.2024.101550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.</p></div>\",\"PeriodicalId\":48662,\"journal\":{\"name\":\"Journal of Informetrics\",\"volume\":\"18 4\",\"pages\":\"Article 101550\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Informetrics\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1751157724000634\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Informetrics","FirstCategoryId":"91","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751157724000634","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Exploring motivations for algorithm mention in the domain of natural language processing: A deep learning approach
With the formation of the fourth paradigm of scientific research, algorithms have become increasingly important in scientific research. In academic papers, algorithms may be mentioned by scholars with various motivations, using, comparing, or improving algorithms to solve complex research tasks. Identifying these motivations can help scholars discover the relationships between algorithms and further assess their roles and values. Therefore, taking the field of natural language processing (NLP) as an example, this article proposes a complete method to conduct the identification, distribution, and evolution of motivations for mentioning algorithms at the sentence level. Specifically, using manual annotation and machine learning methods, we identify algorithm entities and sentences in the full text of papers, classify motivations for mentioning algorithms by pre-training models and data augmentation techniques, and finally analyze the distribution and evolution of motivations. The results show that the deep learning models trained with the augmented data outperform the traditional machine learning models in the classification task. In academic papers, more than half of the sentences show the direct use of algorithms, while the lowest percentage of motivations are improving algorithms, and the diversity of motivations has been increasing with time. For specific algorithms, grammatical algorithms are mentioned more by the motivation of “description,” while more motivations of “use” are found in the machine learning algorithms category. As time passed, the “use” motivations gradually replaced the “description” motivations for different algorithms, and the number of motivation types decreased significantly. Our research explores the identification, distribution, and evolution of authors’ motivations for mentioning algorithm entities, which could provide a basis for future algorithm relationship identification and influence evaluation using motivations.
期刊介绍:
Journal of Informetrics (JOI) publishes rigorous high-quality research on quantitative aspects of information science. The main focus of the journal is on topics in bibliometrics, scientometrics, webometrics, patentometrics, altmetrics and research evaluation. Contributions studying informetric problems using methods from other quantitative fields, such as mathematics, statistics, computer science, economics and econometrics, and network science, are especially encouraged. JOI publishes both theoretical and empirical work. In general, case studies, for instance a bibliometric analysis focusing on a specific research field or a specific country, are not considered suitable for publication in JOI, unless they contain innovative methodological elements.