Xi Deng, Shasha Li, Jie Yu, Jun Ma, Bing Ji, Wuhang Lin, Shezheng Song, Zibo Yi
{"title":"A Modular Hierarchical Model for Paper Quality Evaluation","authors":"Xi Deng, Shasha Li, Jie Yu, Jun Ma, Bing Ji, Wuhang Lin, Shezheng Song, Zibo Yi","doi":"10.5121/csit.2023.130702","DOIUrl":null,"url":null,"abstract":"Paper quality evaluation is of great significance as it helps to select high quality papers from the massive amount of academic papers. However, existing models needs improvement on the interaction and aggregation of the hierarchical structure. These models also ignore the guiding role of the title and abstract in the paper text. To address above two issues, we propose a well-designed modular hierarchical model (MHM) for paper quality evaluation. Firstly, the input to our model is most of the paper text, and no additional information is needed. Secondly, we fully exploit the inherent hierarchy of the text with three encoders with attention mechanisms: a word-to-sentence(WtoS) encoder, a sentence-to-paragraph(StoP) encoder, and a paper encoder. Specifically, the WtoS encoder uses the pre-trained language model SciBERT to obtain the sentence representation from the word representation. The StoP encoder lets sentences in the same paragraph interact and aggregates them to get paragraph embeddings based on importance scores. The paper encoder does interaction among different hierarchical structures of three modules of a paper text: the paper title, abstract sentences, and body paragraphs. Then this encoder aggregates new representations generated into a compact vector. In addition, the paper encoder models the guiding role of the title and abstract, respectively, generating another two compact vectors. We concatenate the above three compact vectors and additional four manual features to obtain the paper representation. This representation is then fed into a classifier to obtain the acceptance decision, which is a proxy for papers’ quality. Experimental results on a large-scale dataset built by ourselves show that our model consistently outperforms the previous strong baselines in four evaluation metrics. Quantitative and qualitative analyses further validate the superiority of our model.","PeriodicalId":42597,"journal":{"name":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","volume":"90 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADCAIJ-Advances in Distributed Computing and Artificial Intelligence Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/csit.2023.130702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Paper quality evaluation is of great significance as it helps to select high quality papers from the massive amount of academic papers. However, existing models needs improvement on the interaction and aggregation of the hierarchical structure. These models also ignore the guiding role of the title and abstract in the paper text. To address above two issues, we propose a well-designed modular hierarchical model (MHM) for paper quality evaluation. Firstly, the input to our model is most of the paper text, and no additional information is needed. Secondly, we fully exploit the inherent hierarchy of the text with three encoders with attention mechanisms: a word-to-sentence(WtoS) encoder, a sentence-to-paragraph(StoP) encoder, and a paper encoder. Specifically, the WtoS encoder uses the pre-trained language model SciBERT to obtain the sentence representation from the word representation. The StoP encoder lets sentences in the same paragraph interact and aggregates them to get paragraph embeddings based on importance scores. The paper encoder does interaction among different hierarchical structures of three modules of a paper text: the paper title, abstract sentences, and body paragraphs. Then this encoder aggregates new representations generated into a compact vector. In addition, the paper encoder models the guiding role of the title and abstract, respectively, generating another two compact vectors. We concatenate the above three compact vectors and additional four manual features to obtain the paper representation. This representation is then fed into a classifier to obtain the acceptance decision, which is a proxy for papers’ quality. Experimental results on a large-scale dataset built by ourselves show that our model consistently outperforms the previous strong baselines in four evaluation metrics. Quantitative and qualitative analyses further validate the superiority of our model.