{"title":"通过元标签分析比较研究课题:使用真实世界数字人文数据的多模块机器算法方法","authors":"Bhaskar Mukherjee, Debasis Majhi, Priya Tiwari, Saloni Chaudhary","doi":"10.5530/jscires.13.1.5","DOIUrl":null,"url":null,"abstract":"The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.","PeriodicalId":43282,"journal":{"name":"Journal of Scientometric Research","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparing Research Topics through Metatags Analysis: A Multi-module Machine Algorithm Approaches Using Real World Data on Digital Humanities\",\"authors\":\"Bhaskar Mukherjee, Debasis Majhi, Priya Tiwari, Saloni Chaudhary\",\"doi\":\"10.5530/jscires.13.1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.\",\"PeriodicalId\":43282,\"journal\":{\"name\":\"Journal of Scientometric Research\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.6000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Scientometric Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5530/jscires.13.1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Scientometric Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5530/jscires.13.1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
Comparing Research Topics through Metatags Analysis: A Multi-module Machine Algorithm Approaches Using Real World Data on Digital Humanities
The present study extract, map and compare the lexical and semantic similarity of terms from author-provided keywords with machine extracted terms and topics from titles and abstracts of an inter-disciplinary field like ‘digital humanities’. Author-provided terms (keywords) were first extracted and mapped through visualization software like Gephi and then these extracted terms were compared with terms extracted from title and abstract of the research articles through NLP based statistical modules. Also, the interdisciplinary of significant topics were measured through the Brillouin index. A set of 7483 articles downloaded from Scopus database on the domain of digital humanities and its associated fields were used for the purpose. We observed the researches on digital humanities are spread over a considerable number of concepts like ‘Industry 4.0’, ‘topic modelling, ‘open science’. Further, the machine algorithm-based extraction compared and identified a larger lexical similarity between these author-provided keywords and title-extracted keywords, rather than abstract-extracted keywords. Jaccard similarity of all author-keywords with machine extracted title keywords came 0.83 and SBERT BiEncoder_score was 0.7374. The top research areas extracted from titles, through unsupervised approach of term extraction resulted in topics like digital humanities approach, digital humanities visualization, indicating a strong connection to the discipline of digital humanities. The average interdisciplinarity index of top significant topics came between 1.217 and 1.284, with the highest index value for ‘computational digital humanities’. As this study is based on real-world data, it is highly useful to understand how far machine algorithm-based text extraction can be helpful for information retrieval process.