一种可行的中文文本数据预处理策略

Jingang Liu, Chunhe Xia, Haihua Yan, Jie Sun
{"title":"一种可行的中文文本数据预处理策略","authors":"Jingang Liu, Chunhe Xia, Haihua Yan, Jie Sun","doi":"10.1109/UEMCON51285.2020.9298131","DOIUrl":null,"url":null,"abstract":"With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.","PeriodicalId":433609,"journal":{"name":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Feasible Chinese Text Data Preprocessing Strategy\",\"authors\":\"Jingang Liu, Chunhe Xia, Haihua Yan, Jie Sun\",\"doi\":\"10.1109/UEMCON51285.2020.9298131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.\",\"PeriodicalId\":433609,\"journal\":{\"name\":\"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UEMCON51285.2020.9298131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UEMCON51285.2020.9298131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

随着机器学习等人工智能技术的迅速兴起和大数据产业的快速发展,人们越来越关注数据本身的使用,尤其是中文文本数据,其表达更加复杂,信息更加丰富。在将原始中文文本数据用于特定的应用程序任务之前,对其进行处理是必要的步骤。然而,目前的数据处理策略通常是处理不同领域和特定应用任务中的数据。为了进一步提高中文数据的处理质量,发挥中文数据的应用价值,本文提出了一种通用的、可行的中文文本预处理策略,即多级数据预处理策略(MLDPS)。该策略采用四个有效环节对原始中文文本数据进行系统处理。我们认为所提出的MLDPS具有较强的实际意义,为中文文本数据的预处理提供了更好的思路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Feasible Chinese Text Data Preprocessing Strategy
With the rapid rise of artificial intelligence technologies such as machine learning and the rapid development of the big data industry, more and more attention is paid to the use of data itself, especially the Chinese text data, which is more complex in expression and richer in the information. It is a necessary step to process the raw Chinese text data before it is used for specific application tasks. However, the current strategies for processing data are generally to deal with data in different fields and specific application tasks. In this paper, to further improve the quality of Chinese data processing and give play to the application value of Chinese data, we propose a general and feasible Chinese text preprocessing strategy, named the multi-level data preprocessing strategy (MLDPS). This strategy uses four effective links to process raw Chinese text data systematically. We believe that the proposed MLDPS has relatively strong practical significance, and provides a better idea for preprocessing Chinese text data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信