基于预训练模型的中文法律文书阅读理解研究

Proceedings of the 5th International Conference on Computer Science and Application Engineering Pub Date : 2021-10-19 DOI:10.1145/3487075.3487157

Lufeng Yuan, Wei Zhang, Xiaoxin Gao, Linlin Zhao, Bin Liu, Maokai Liu

{"title":"基于预训练模型的中文法律文书阅读理解研究","authors":"Lufeng Yuan, Wei Zhang, Xiaoxin Gao, Linlin Zhao, Bin Liu, Maokai Liu","doi":"10.1145/3487075.3487157","DOIUrl":null,"url":null,"abstract":"We research how to read and understand Chinese legal documents. At first, we analyze the difficulties of reading comprehension of Chinese legal documents. Data imbalance exists seriously among span extraction query, yes/no query and unanswerable query, that is span extraction queries account for more than 80%. The reading comprehension of Chinese legal documents is a typical long text reading problem. Then we propose a framework for reading and understanding Chinese legal documents. Based on the Bert pre-training model, the framework performs fine-tine for Chinese legal documents, adopts a variety of deep learning models, and uses data enhancement and ensemble strategy to solve reading comprehension of Chinese legal documents. Finally, we test the framework with real legal documents, and the macro average F value can reach 82.773.","PeriodicalId":354966,"journal":{"name":"Proceedings of the 5th International Conference on Computer Science and Application Engineering","volume":"253 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Chinese Legal Document Reading Comprehension Based on Pre-Training Model\",\"authors\":\"Lufeng Yuan, Wei Zhang, Xiaoxin Gao, Linlin Zhao, Bin Liu, Maokai Liu\",\"doi\":\"10.1145/3487075.3487157\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We research how to read and understand Chinese legal documents. At first, we analyze the difficulties of reading comprehension of Chinese legal documents. Data imbalance exists seriously among span extraction query, yes/no query and unanswerable query, that is span extraction queries account for more than 80%. The reading comprehension of Chinese legal documents is a typical long text reading problem. Then we propose a framework for reading and understanding Chinese legal documents. Based on the Bert pre-training model, the framework performs fine-tine for Chinese legal documents, adopts a variety of deep learning models, and uses data enhancement and ensemble strategy to solve reading comprehension of Chinese legal documents. Finally, we test the framework with real legal documents, and the macro average F value can reach 82.773.\",\"PeriodicalId\":354966,\"journal\":{\"name\":\"Proceedings of the 5th International Conference on Computer Science and Application Engineering\",\"volume\":\"253 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 5th International Conference on Computer Science and Application Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3487075.3487157\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 5th International Conference on Computer Science and Application Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487075.3487157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

我们研究如何阅读和理解中国法律文件。首先，我们分析了中文法律文书阅读理解的难点。跨度抽取查询、是/否查询和不可回答查询之间存在严重的数据不平衡，即跨度抽取查询占查询总数的80%以上。中文法律文书的阅读理解是一个典型的长文本阅读问题。然后，我们提出了一个阅读和理解中国法律文件的框架。该框架以Bert预训练模型为基础，对中文法律文件进行细时间化处理，采用多种深度学习模型，采用数据增强和集成策略解决中文法律文件的阅读理解问题。最后用实际法律文件对框架进行检验，宏观平均F值可以达到82.773。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Research on Chinese Legal Document Reading Comprehension Based on Pre-Training Model

We research how to read and understand Chinese legal documents. At first, we analyze the difficulties of reading comprehension of Chinese legal documents. Data imbalance exists seriously among span extraction query, yes/no query and unanswerable query, that is span extraction queries account for more than 80%. The reading comprehension of Chinese legal documents is a typical long text reading problem. Then we propose a framework for reading and understanding Chinese legal documents. Based on the Bert pre-training model, the framework performs fine-tine for Chinese legal documents, adopts a variety of deep learning models, and uses data enhancement and ensemble strategy to solve reading comprehension of Chinese legal documents. Finally, we test the framework with real legal documents, and the macro average F value can reach 82.773.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 5th International Conference on Computer Science and Application Engineering

自引率

0.00%

发文量