Development of a Text Processing Automation Method for Using Modern Korean-Chinese Mixed Text in General Education

Sun-young Kim
{"title":"Development of a Text Processing Automation Method for Using Modern Korean-Chinese Mixed Text in General Education","authors":"Sun-young Kim","doi":"10.46392/kjge.2023.17.5.41","DOIUrl":null,"url":null,"abstract":"This study aims to develop an automated text processing method for the use of modern Korean-Chinese mixed newspaper materials in liberal arts education. This article describes the process, its results, and additional tasks. In particular, the focus was placed on the batch processing of texts downloaded in large quantities. The problem with the existing computerized and serviced Korean-Chinese mixed texts is that most of the old Korean texts were computerized with PUA codes, which are not currently in Unicode standards. To process or analyze texts in computer language, it is necessary to convert these characters into standard code methods. Based on the standardized data brought in, the work of replacing the old form of words with the current Korean notation was carried out. Finally, in the text, phonological and Korean notation of Chinese characters are added in parentheses. Reviewing the results shows a significant improvement in readability. If you want to use modern Korean and Chinese newspaper materials for liberal arts education, it may be difficult unless a separate feed process is premised. When trying to use modern Korean-Chinese newspaper materials for liberal arts education, it can be difficult due to the problem of notations. Furthermore, this automated form of processing allows instructors to extract articles related to specific topics and to read articles with students that increase their readability. But as things stand, there are limits. For example, if a Chinese character has multiple consonants, there may be cases in which a Chinese character has a reading sound that does not fit the Chinese character word and rather interferes with the reading of it. These should be used for education purposes after correction. In addition, even if it is a Korean-Chinese mixed text and pronunciation is provided here, it is difficult for a student without knowledge of classical Chinese grammar to read that text if a lot of classical Chinese expressions are mixed in. This case ultimately becomes a problem that can be solved by the development of a classical Chinese morpheme analyzer. If progress is made on such things as the construction of the classic Chinese Corpus, it will be of great help in the development of history and liberal arts teaching tools.","PeriodicalId":476520,"journal":{"name":"Korean Journal of General Education","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Korean Journal of General Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46392/kjge.2023.17.5.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study aims to develop an automated text processing method for the use of modern Korean-Chinese mixed newspaper materials in liberal arts education. This article describes the process, its results, and additional tasks. In particular, the focus was placed on the batch processing of texts downloaded in large quantities. The problem with the existing computerized and serviced Korean-Chinese mixed texts is that most of the old Korean texts were computerized with PUA codes, which are not currently in Unicode standards. To process or analyze texts in computer language, it is necessary to convert these characters into standard code methods. Based on the standardized data brought in, the work of replacing the old form of words with the current Korean notation was carried out. Finally, in the text, phonological and Korean notation of Chinese characters are added in parentheses. Reviewing the results shows a significant improvement in readability. If you want to use modern Korean and Chinese newspaper materials for liberal arts education, it may be difficult unless a separate feed process is premised. When trying to use modern Korean-Chinese newspaper materials for liberal arts education, it can be difficult due to the problem of notations. Furthermore, this automated form of processing allows instructors to extract articles related to specific topics and to read articles with students that increase their readability. But as things stand, there are limits. For example, if a Chinese character has multiple consonants, there may be cases in which a Chinese character has a reading sound that does not fit the Chinese character word and rather interferes with the reading of it. These should be used for education purposes after correction. In addition, even if it is a Korean-Chinese mixed text and pronunciation is provided here, it is difficult for a student without knowledge of classical Chinese grammar to read that text if a lot of classical Chinese expressions are mixed in. This case ultimately becomes a problem that can be solved by the development of a classical Chinese morpheme analyzer. If progress is made on such things as the construction of the classic Chinese Corpus, it will be of great help in the development of history and liberal arts teaching tools.
现代朝鲜语-汉语混合文本在通识教育中的文本处理自动化方法开发
本研究旨在开发一种现代韩中混合报纸材料在文科教育中使用的自动文本处理方法。本文描述了该过程、结果和其他任务。重点特别放在批量处理大量下载的文本上。现有的电脑化和服务朝鲜文-中文混合文本的问题是,大多数旧朝鲜文文本是用PUA代码电脑化的,而这些代码目前不在Unicode标准中。为了处理或分析计算机语言文本,必须将这些字符转换为标准代码方法。以引进的标准化资料为基础,进行了用现行的韩文符号代替旧的文字形式的工作。最后,在正文中,在括号中添加了汉字的音标和韩标。回顾结果显示可读性有了显著的提高。如果想在人文教育中使用现代韩文和中文报纸资料,如果没有单独的输入过程,可能会很困难。在文科教育中使用现代韩文报纸资料时,由于符号的问题,有时会遇到困难。此外,这种自动化的处理形式允许教师提取与特定主题相关的文章,并与学生一起阅读文章,以提高其可读性。但就目前情况来看,还是有限制的。例如,如果一个汉字有多个辅音,可能会出现汉字的读音与汉字的读音不匹配,从而干扰汉字的阅读。这些应用于教育目的后,纠正。另外,即使是韩文和中文混合的文本,即使提供了发音,如果大量的文言文混合在一起,对于不了解文言文语法的学生来说,也很难阅读。这种情况最终成为一个文言文语素分析器可以解决的问题。如果在汉语经典语料库建设等方面取得进展,将对历史和文科教学工具的发展有很大的帮助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信