Construction of Narrative Text Component Recognition Corpus

2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET) Pub Date : 2022-08-19 DOI:10.1109/CCET55412.2022.9906339

Feng Zhang, Yingqi Han, Jiong Wang, Jie Liu

{"title":"Construction of Narrative Text Component Recognition Corpus","authors":"Feng Zhang, Yingqi Han, Jiong Wang, Jie Liu","doi":"10.1109/CCET55412.2022.9906339","DOIUrl":null,"url":null,"abstract":"Textual structure analysis is an important part of Automatic Essay Score (AES), and is also one of the important research directions in Natural Language Processing. At present, there are still deficiencies in the research of narrative textual structure in China, one of the main reasons is the lack of data available for research. To solve this problem, this paper proposes and constructs a corpus for the textual component identification of narrative essay. This paper divides the text structure of narrative essay, and forms a corpus for the narrative essay component identification. The paper finally annotated 3024 articles with 21128 sentences in total. This paper combines manual annotation and the automatic annotation of the model to build corpus, and conducts statistical analysis on the distribution of the corpus content and the consistency of the corpus annotation. The experiment shows text component recognition performance achieves 80.75% F 1 score. The work provided basic data for the research of AES.","PeriodicalId":329327,"journal":{"name":"2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCET55412.2022.9906339","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Textual structure analysis is an important part of Automatic Essay Score (AES), and is also one of the important research directions in Natural Language Processing. At present, there are still deficiencies in the research of narrative textual structure in China, one of the main reasons is the lack of data available for research. To solve this problem, this paper proposes and constructs a corpus for the textual component identification of narrative essay. This paper divides the text structure of narrative essay, and forms a corpus for the narrative essay component identification. The paper finally annotated 3024 articles with 21128 sentences in total. This paper combines manual annotation and the automatic annotation of the model to build corpus, and conducts statistical analysis on the distribution of the corpus content and the consistency of the corpus annotation. The experiment shows text component recognition performance achieves 80.75% F 1 score. The work provided basic data for the research of AES.

查看原文本刊更多论文

叙事文本成分识别语料库的构建

文本结构分析是自动作文评分(AES)的重要组成部分，也是自然语言处理的重要研究方向之一。目前，国内对叙事文本结构的研究还存在不足，其中一个主要原因是缺乏可用于研究的数据。为了解决这一问题，本文提出并构建了一个叙事性短文语篇成分识别的语料库。本文对叙事性散文的文本结构进行了划分，形成了叙事性散文成分识别的语料库。论文最终注释了3024篇文章，共计21128个句子。本文将人工标注与模型自动标注相结合构建语料库，并对语料库内容的分布和语料库标注的一致性进行统计分析。实验表明，文本成分识别性能达到80.75%的f1分。该工作为AES的研究提供了基础数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 5th International Conference on Computer and Communication Engineering Technology (CCET)

自引率

0.00%

发文量