{"title":"基于词位标注的藏文分词方法","authors":"Caijun Kang, Di Jiang, Congjun Long","doi":"10.1109/IALP.2013.74","DOIUrl":null,"url":null,"abstract":"The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Tibetan Word Segmentation Based on Word-Position Tagging\",\"authors\":\"Caijun Kang, Di Jiang, Congjun Long\",\"doi\":\"10.1109/IALP.2013.74\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.\",\"PeriodicalId\":413833,\"journal\":{\"name\":\"2013 International Conference on Asian Language Processing\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Asian Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IALP.2013.74\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Asian Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2013.74","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tibetan Word Segmentation Based on Word-Position Tagging
The best advantage of Tibetan word segmentation based on word-position is to reduce segmentation errors for unknown words. In this article authors upgrade usual 4-tag set to 6-tag set to fit in with the features of Tibetan characters, using CRF as tagging model to train and test corpus data, then building post processing modules to revise the result data. The experimental result shows that this method achieves a good performance and deserves further study, including expanding the corpus and optimizing the tag set and feature templates.