{"title":"Collaborative Collection of Multilingual Pronoun Substitutes and Address Terms","authors":"Virach Sornlertlamvanich, Hiroki Nomoto, Sunisa Wittayapanyanon Saito, Atsushi Kasuga, Kenji Okano, Wataru Okubo, Yunjin Nam, Yoshimi Miyake, Thuzar Hlaing, Ryuko Taniguchi, Sri Budi Lestari","doi":"10.1109/ICBIR54589.2022.9786494","DOIUrl":null,"url":null,"abstract":"This paper describes the encoding scheme for pronoun substitutes and address terms in eight Asian languages based on the vocative studies. The target languages are selected according to the availability of the language experts and resources. The nature of pronoun substitutes and address terms expression across the languages can be confirmed by the concepts defined in the WordNet. In this study, a workbench for text data collection (WordList) has been carefully designed to maintain the input data consistency and the semantic linkage between the target languages. The WordList is a web-based application facilitating an online collaborative data input. It maintains the data in MongoDB, and supports JSON and CSV format file exporting for database backup and batch data cleansing for further expression pattern study.","PeriodicalId":216904,"journal":{"name":"2022 7th International Conference on Business and Industrial Research (ICBIR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Conference on Business and Industrial Research (ICBIR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBIR54589.2022.9786494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper describes the encoding scheme for pronoun substitutes and address terms in eight Asian languages based on the vocative studies. The target languages are selected according to the availability of the language experts and resources. The nature of pronoun substitutes and address terms expression across the languages can be confirmed by the concepts defined in the WordNet. In this study, a workbench for text data collection (WordList) has been carefully designed to maintain the input data consistency and the semantic linkage between the target languages. The WordList is a web-based application facilitating an online collaborative data input. It maintains the data in MongoDB, and supports JSON and CSV format file exporting for database backup and batch data cleansing for further expression pattern study.