基于多语种书目、引文和术语数据库的机器学习方法研究中国佛教经典

2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC) Pub Date : 2022-09-01 DOI:10.23919/PNC56605.2022.9982732

Alex Amies

{"title":"基于多语种书目、引文和术语数据库的机器学习方法研究中国佛教经典","authors":"Alex Amies","doi":"10.23919/PNC56605.2022.9982732","DOIUrl":null,"url":null,"abstract":"This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.","PeriodicalId":385232,"journal":{"name":"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Approaches with Multilingual Bibliographic, Quotation, and Terminology Databases for the Study of the Chinese Buddhist Canon\",\"authors\":\"Alex Amies\",\"doi\":\"10.23919/PNC56605.2022.9982732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.\",\"PeriodicalId\":385232,\"journal\":{\"name\":\"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/PNC56605.2022.9982732\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/PNC56605.2022.9982732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文描述了用参考书目、引文和术语数据库研究中国佛教经典的机器学习方法。这些技术的使用将中国佛教经典学术研究的传统文物带到学者的数字桌面，以及讨论经典的现代中国文学的英文翻译。书目数据库包括英文翻译的标题，参考到现代翻译，平行的梵语和其他语言，和一般参考。机器学习可以帮助学者在全文检索中发现相关信息，并为上下文显示关键书目信息。这样做的目的是使中文佛教文本的学习和翻译更容易、更快捷，并以一种容易发现的方式呈现给读者，用户界面元素如鼠标悬停、弹出窗口和搜索片段的工具提示。在这种情况下，机器学习方法是基于使用文档相似性的逻辑回归统计模型。我们将从大藏经中举出例子。在佛教文献中发现的多语种语录数据库将被简要描述。本文将描述决策树分类器在双语短语语料库上的使用。模型的可解释性和简单性以及较少的训练材料将与深度学习模型的需求形成对比。例子将从蓝崖记录中给出。第三，介绍了现代汉语文本中嵌入佛光山人文佛教词汇的机器翻译的应用。这种方法使用了深度学习和谷歌翻译API。尽管许多翻译令人惊讶地好，总体结果是有益的，但与结果准确性有关的挑战将被严格审查。将解释修改术语表以提高翻译质量的必要性。本文将举例比较星运法师作品的机器翻译与人工翻译。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Approaches with Multilingual Bibliographic, Quotation, and Terminology Databases for the Study of the Chinese Buddhist Canon

This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)

自引率

0.00%

发文量