论文质量评价的模块化层次模型

IF 1.7 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xi Deng, Shasha Li, Jie Yu, Jun Ma, Bing Ji, Wuhang Lin, Shezheng Song, Zibo Yi
{"title":"论文质量评价的模块化层次模型","authors":"Xi Deng, Shasha Li, Jie Yu, Jun Ma, Bing Ji, Wuhang Lin, Shezheng Song, Zibo Yi","doi":"10.5121/csit.2023.130702","DOIUrl":null,"url":null,"abstract":"Paper quality evaluation is of great significance as it helps to select high quality papers from the massive amount of academic papers. However, existing models needs improvement on the interaction and aggregation of the hierarchical structure. These models also ignore the guiding role of the title and abstract in the paper text. To address above two issues, we propose a well-designed modular hierarchical model (MHM) for paper quality evaluation. Firstly, the input to our model is most of the paper text, and no additional information is needed. Secondly, we fully exploit the inherent hierarchy of the text with three encoders with attention mechanisms: a word-to-sentence(WtoS) encoder, a sentence-to-paragraph(StoP) encoder, and a paper encoder. Specifically, the WtoS encoder uses the pre-trained language model SciBERT to obtain the sentence representation from the word representation. The StoP encoder lets sentences in the same paragraph interact and aggregates them to get paragraph embeddings based on importance scores. The paper encoder does interaction among different hierarchical structures of three modules of a paper text: the paper title, abstract sentences, and body paragraphs. Then this encoder aggregates new representations generated into a compact vector. In addition, the paper encoder models the guiding role of the title and abstract, respectively, generating another two compact vectors. We concatenate the above three compact vectors and additional four manual features to obtain the paper representation. This representation is then fed into a classifier to obtain the acceptance decision, which is a proxy for papers’ quality. Experimental results on a large-scale dataset built by ourselves show that our model consistently outperforms the previous strong baselines in four evaluation metrics. Quantitative and qualitative analyses further validate the superiority of our model.","PeriodicalId":42597,"journal":{"name":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","volume":null,"pages":null},"PeriodicalIF":1.7000,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Modular Hierarchical Model for Paper Quality Evaluation\",\"authors\":\"Xi Deng, Shasha Li, Jie Yu, Jun Ma, Bing Ji, Wuhang Lin, Shezheng Song, Zibo Yi\",\"doi\":\"10.5121/csit.2023.130702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Paper quality evaluation is of great significance as it helps to select high quality papers from the massive amount of academic papers. However, existing models needs improvement on the interaction and aggregation of the hierarchical structure. These models also ignore the guiding role of the title and abstract in the paper text. To address above two issues, we propose a well-designed modular hierarchical model (MHM) for paper quality evaluation. Firstly, the input to our model is most of the paper text, and no additional information is needed. Secondly, we fully exploit the inherent hierarchy of the text with three encoders with attention mechanisms: a word-to-sentence(WtoS) encoder, a sentence-to-paragraph(StoP) encoder, and a paper encoder. Specifically, the WtoS encoder uses the pre-trained language model SciBERT to obtain the sentence representation from the word representation. The StoP encoder lets sentences in the same paragraph interact and aggregates them to get paragraph embeddings based on importance scores. The paper encoder does interaction among different hierarchical structures of three modules of a paper text: the paper title, abstract sentences, and body paragraphs. Then this encoder aggregates new representations generated into a compact vector. In addition, the paper encoder models the guiding role of the title and abstract, respectively, generating another two compact vectors. We concatenate the above three compact vectors and additional four manual features to obtain the paper representation. This representation is then fed into a classifier to obtain the acceptance decision, which is a proxy for papers’ quality. Experimental results on a large-scale dataset built by ourselves show that our model consistently outperforms the previous strong baselines in four evaluation metrics. Quantitative and qualitative analyses further validate the superiority of our model.\",\"PeriodicalId\":42597,\"journal\":{\"name\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-04-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5121/csit.2023.130702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2023.130702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

论文质量评价对于从海量的学术论文中选出高质量的论文具有重要意义。但是,现有的模型在层次结构的交互和聚合方面还有待改进。这些模式也忽略了题目和摘要在论文正文中的引导作用。为了解决以上两个问题,我们提出了一个设计良好的模块化层次模型(MHM)用于论文质量评估。首先,我们模型的输入是大部分的论文文本,不需要额外的信息。其次,我们利用三个具有注意机制的编码器充分利用了文本的内在层次结构:单词到句子(WtoS)编码器、句子到段落(StoP)编码器和论文编码器。具体来说,WtoS编码器使用预训练的语言模型SciBERT从单词表示中获得句子表示。StoP编码器允许同一段落中的句子相互作用,并根据重要性分数聚合它们以获得段落嵌入。论文编码器在论文文本的三个模块:论文标题、摘要句和主体段落之间进行不同层次结构的交互。然后,该编码器将生成的新表示聚合到一个紧凑的向量中。此外,本文编码器分别对标题和摘要的引导作用进行建模,生成另外两个紧凑向量。我们将上述三个紧凑向量和另外四个手动特征连接起来,以获得纸张表示。然后将此表示输入到分类器中以获得验收决定,这是papersâ 质量的代理。在我们自己构建的大规模数据集上的实验结果表明,我们的模型在四个评估指标上始终优于先前的强基线。定量和定性分析进一步验证了模型的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Modular Hierarchical Model for Paper Quality Evaluation
Paper quality evaluation is of great significance as it helps to select high quality papers from the massive amount of academic papers. However, existing models needs improvement on the interaction and aggregation of the hierarchical structure. These models also ignore the guiding role of the title and abstract in the paper text. To address above two issues, we propose a well-designed modular hierarchical model (MHM) for paper quality evaluation. Firstly, the input to our model is most of the paper text, and no additional information is needed. Secondly, we fully exploit the inherent hierarchy of the text with three encoders with attention mechanisms: a word-to-sentence(WtoS) encoder, a sentence-to-paragraph(StoP) encoder, and a paper encoder. Specifically, the WtoS encoder uses the pre-trained language model SciBERT to obtain the sentence representation from the word representation. The StoP encoder lets sentences in the same paragraph interact and aggregates them to get paragraph embeddings based on importance scores. The paper encoder does interaction among different hierarchical structures of three modules of a paper text: the paper title, abstract sentences, and body paragraphs. Then this encoder aggregates new representations generated into a compact vector. In addition, the paper encoder models the guiding role of the title and abstract, respectively, generating another two compact vectors. We concatenate the above three compact vectors and additional four manual features to obtain the paper representation. This representation is then fed into a classifier to obtain the acceptance decision, which is a proxy for papers’ quality. Experimental results on a large-scale dataset built by ourselves show that our model consistently outperforms the previous strong baselines in four evaluation metrics. Quantitative and qualitative analyses further validate the superiority of our model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
1.40
自引率
0.00%
发文量
22
审稿时长
4 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信