从双语网站收集中越文本

2018 5th NAFOSTED Conference on Information and Computer Science (NICS) Pub Date : 2018-11-01 DOI:10.1109/NICS.2018.8606890

M. Trinh, Phuoc Tran, Nhung Tran

{"title":"从双语网站收集中越文本","authors":"M. Trinh, Phuoc Tran, Nhung Tran","doi":"10.1109/NICS.2018.8606890","DOIUrl":null,"url":null,"abstract":"A monolingual-bilingual corpora are extremely necessary for natural language processing, especially for machine translation. In this paper, we propose a method to automatically collect bilingual Chinese-Vietnamese documents from bilingual Chinese-Vietnamese websites. These bilingual documents are the premise for extracting bilingual sentence pairs in our next research works. Our collection system was conducted on 10 Vietnamese-Chinese bilingual websites and initially gave encouraging results. This system can be deployed to collect automatically for other language pairs. less diversified.","PeriodicalId":137666,"journal":{"name":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Collecting Chinese-Vietnamese Texts From Bilingual Websites\",\"authors\":\"M. Trinh, Phuoc Tran, Nhung Tran\",\"doi\":\"10.1109/NICS.2018.8606890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A monolingual-bilingual corpora are extremely necessary for natural language processing, especially for machine translation. In this paper, we propose a method to automatically collect bilingual Chinese-Vietnamese documents from bilingual Chinese-Vietnamese websites. These bilingual documents are the premise for extracting bilingual sentence pairs in our next research works. Our collection system was conducted on 10 Vietnamese-Chinese bilingual websites and initially gave encouraging results. This system can be deployed to collect automatically for other language pairs. less diversified.\",\"PeriodicalId\":137666,\"journal\":{\"name\":\"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NICS.2018.8606890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS.2018.8606890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

单语双语语料库对于自然语言处理，特别是机器翻译是非常必要的。本文提出了一种从中越双语网站中自动收集中越双语文档的方法。这些双语文档是我们下一步研究工作中提取双语句子对的前提。我们的收集系统在10个越中双语网站上进行了测试，初步取得了令人鼓舞的结果。该系统可部署用于其他语言对的自动采集。更少的多样化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Collecting Chinese-Vietnamese Texts From Bilingual Websites

A monolingual-bilingual corpora are extremely necessary for natural language processing, especially for machine translation. In this paper, we propose a method to automatically collect bilingual Chinese-Vietnamese documents from bilingual Chinese-Vietnamese websites. These bilingual documents are the premise for extracting bilingual sentence pairs in our next research works. Our collection system was conducted on 10 Vietnamese-Chinese bilingual websites and initially gave encouraging results. This system can be deployed to collect automatically for other language pairs. less diversified.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 5th NAFOSTED Conference on Information and Computer Science (NICS)

自引率

0.00%

发文量