A Hybrid Approach to Interpretable Analysis of Research Paper Collections

B. Mirkin, Dmitry Frolov, Alex Vlasov, Susana Nascimento, T. Fenner
{"title":"A Hybrid Approach to Interpretable Analysis of Research Paper Collections","authors":"B. Mirkin, Dmitry Frolov, Alex Vlasov, Susana Nascimento, T. Fenner","doi":"10.1145/3405962.3405976","DOIUrl":null,"url":null,"abstract":"We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a \"head subject\" in the higher ranks of the taxonomy, that is supposed to \"tightly\" cover the query set, possibly bringing in some errors, both \"gaps\" and \"offshoots\". Our method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.","PeriodicalId":247414,"journal":{"name":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3405962.3405976","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We define and find a most specific generalization of a fuzzy set of topics assigned to leaves of the rooted tree of a taxonomy. This generalization lifts the set to a "head subject" in the higher ranks of the taxonomy, that is supposed to "tightly" cover the query set, possibly bringing in some errors, both "gaps" and "offshoots". Our method involves two more automated analysis techniques: a fuzzy clustering method, FADDIS, involving both additive and spectral properties, and a purely structural string-to-text relevance measure based on suffix trees annotated by frequencies. We apply this to extract research tendencies from two collections of research papers: (a) about 18000 research papers published in Springer journals on data science for 20 years, and (b) about 27000 research papers retrieved from Springer and Elsevier journals in response to data science related queries. We consider a taxonomy of Data Science based on the Association for Computing Machinery Classification of Computing System (ACM-CCS 2012). Our findings allow us to make some comments on the tendencies of research that cannot be derived by using more conventional techniques.
研究论文集合可解释性分析的混合方法
我们定义并找到分配给分类法的根树的叶子的模糊主题集的最具体泛化。这种泛化将该集提升到分类法中较高级别的“头主题”,这应该“紧密”覆盖查询集,可能会带来一些错误,包括“空白”和“分支”。我们的方法涉及两种自动化分析技术:一种模糊聚类方法,FADDIS,涉及加性和谱性,以及基于频率注释的后缀树的纯结构字符串到文本相关性度量。我们将此应用于从两个研究论文集合中提取研究趋势:(a) 20年来在施普林格期刊上发表的关于数据科学的约18000篇研究论文,以及(b)在回应数据科学相关查询时从施普林格和爱思唯尔期刊上检索的约27000篇研究论文。我们考虑了基于计算机械协会计算系统分类(ACM-CCS 2012)的数据科学分类法。我们的发现使我们能够对研究趋势作出一些评论,而这些评论是无法通过使用更传统的技术得出的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信