{"title":"基于多语种书目、引文和术语数据库的机器学习方法研究中国佛教经典","authors":"Alex Amies","doi":"10.23919/PNC56605.2022.9982732","DOIUrl":null,"url":null,"abstract":"This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.","PeriodicalId":385232,"journal":{"name":"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Approaches with Multilingual Bibliographic, Quotation, and Terminology Databases for the Study of the Chinese Buddhist Canon\",\"authors\":\"Alex Amies\",\"doi\":\"10.23919/PNC56605.2022.9982732\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.\",\"PeriodicalId\":385232,\"journal\":{\"name\":\"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/PNC56605.2022.9982732\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Pacific Neighborhood Consortium Annual Conference and Joint Meetings (PNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/PNC56605.2022.9982732","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning Approaches with Multilingual Bibliographic, Quotation, and Terminology Databases for the Study of the Chinese Buddhist Canon
This paper describes machine learning approaches for the study of the Chinese Buddhist Canon with bibliographic, quotation, and terminology databases. The use of these techniques brings traditional artifacts of academic study for the Chinese Buddhist canon to scholars’ digital desktops as well as English translations of modern Chinese literature discussing the canon. The bibliographic database includes English translations of titles, references to modern translations, parallels in Sanskrit and other languages, and general references. Machine learning can be used to help the scholar discover relevant information in full text search with key bibliographical information surfaced for context. The goals of this are to make study and translation of Chinese Buddhist texts easier and faster with bibliographical content and presented to a reader in an easily discoverable way, with user interface elements such as tooltips with mouse-over, popovers, and search snippets. The machine learning approach in this case is based on a logistic regression statistical model for relevance using document similarity. Examples will be given from the Taishō Tripitaka.A database of multilingual quotations found in Buddhist literature will be briefly described. This paper will describe the use of a decision tree classifier on a corpus of bilingual phrases trained for relevance. The explainability and simplicity of the model and smaller amount of training material will be contrasted with the needs of deep learning models. Examples will be given from the Blue Cliff Record.Thirdly, use of machine translation of modern Chinese text into English with the Fo Guang Shan Glossary of Humanistic Buddhism embedded will be described. This approach uses deep learning with the Google Translate API. Although many of the translations are surprisingly good and the overall result is helpful, the challenges relating to the accuracy of the results will be examined critically. The need for modifications to the glossary to improve translation quality will be explained. Examples of translations with Venerable Master Hsing Yun’s works comparing machine and human translation will be given.