2013 International Conference on Asian Language Processing最新文献

筛选
英文 中文
Tibetan Text Classification Based on the Feature of Position Weight 基于位置权重特征的藏文文本分类
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.63
Hui Cao, Huiqiang Jia
{"title":"Tibetan Text Classification Based on the Feature of Position Weight","authors":"Hui Cao, Huiqiang Jia","doi":"10.1109/IALP.2013.63","DOIUrl":"https://doi.org/10.1109/IALP.2013.63","url":null,"abstract":"Based on the study of Tibetan characters and grammar, this paper has done research on Tibetan in the text categorization weight algorithm based on the vector space model. Comprehensively considering the position information of Tibetan which presented in the text, the paper has proposed an improved TF-IDF weighting algorithm. In the process, it has adopted χ2 (CHI) statistical methods for features on the Tibetan word document extraction and used the cosine method in Tibetan text similarity calculation to distinguish between similar documents in Tibetan. The Tibetan text classification algorithm with linear separable support vector machine classification of Tibetan texts, and finally compared the TF-IDF algorithm with the improved TF-IDF algorithm in the effects of the Tibetan text classification. Finally, it shows that the improved TF-IDF algorithm has better classification effect.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"178 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116058130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Categorization and Identification of Fragments with Shi Plus Punctuation “施加号”断句的分类与识别
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.16
Guonian Wang, Lin He
{"title":"Categorization and Identification of Fragments with Shi Plus Punctuation","authors":"Guonian Wang, Lin He","doi":"10.1109/IALP.2013.16","DOIUrl":"https://doi.org/10.1109/IALP.2013.16","url":null,"abstract":"Studies on Chinese sentences with shi (ÊÇ) as predicate have been profoundly fruitful from the perspective of syntax, semantics and pragmatics. In a broader sense, however, a large number of sentences with shi functioning as other syntactic roles - adverb, conjunction, auxiliary and even interjection - are practically used, and stand as barriers to natural language processing (NLP) and machine translation (MT). The special fragments consisting of shi plus punctuation are divided into \"shi plus comma\" and \"comma plus shi\", which are examined and discussed with the instruments of corpora, illustrations and comparison. Two exceptional fragments are also briefed to improve the precision in computer identification of these shi-plus-punctuation fragments.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114282604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dependency Parsing for Traditional Mongolian 传统蒙古语的依赖关系分析
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.55
Xiangdong Su, Guanglai Gao, Xueliang Yan
{"title":"Dependency Parsing for Traditional Mongolian","authors":"Xiangdong Su, Guanglai Gao, Xueliang Yan","doi":"10.1109/IALP.2013.55","DOIUrl":"https://doi.org/10.1109/IALP.2013.55","url":null,"abstract":"Dependency parsing has become increasingly popular in natural language processing in recent years. Nevertheless, dependency parsing focused on Tradition Mongolian has not attracted much attention. We investigate it with Maximum Spanning Tree (MST) based model on Traditional Mongolian dependency tree bank (TMDT). This paper briefly introduces Traditional Mongolian along with TMDT, and discusses the details of MST. Much emphasis is placed on the performance comparisons among eight kinds of features and their combinations in order to find a suitable feature representation. Evaluation result shows that the combination of Basic Unigram Features, Basic Bi-gram Features and C-C Sibling Features obtains the best performance. Our work establishes a baseline for dependency parsing of Traditional Mongolian.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123380028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An Empirical Evaluation of Dimensionality Reduction Using Latent Semantic Analysis on Hindi Text 基于潜在语义分析的印地语文本降维效果的实证评价
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.11
Karthik Krishnamurthi, Ravi Kumar Sudi, Vijayapal Reddy Panuganti, Vishnu Vardhan Bulusu
{"title":"An Empirical Evaluation of Dimensionality Reduction Using Latent Semantic Analysis on Hindi Text","authors":"Karthik Krishnamurthi, Ravi Kumar Sudi, Vijayapal Reddy Panuganti, Vishnu Vardhan Bulusu","doi":"10.1109/IALP.2013.11","DOIUrl":"https://doi.org/10.1109/IALP.2013.11","url":null,"abstract":"Dimensionality reduction is the process of deriving an approximate representation of a dataset, that can reflect most of the correlations underlying within the dataset. In the context of text processing, dimensionality reduction is used for transforming any text to a precise representation that efficiently identifies the main insights of the original text. LSA(Latent Semantic Analysis) is a technique that is used to find correlations between words and sentences based on the usage of words within the text. This paper addresses the issue of dimensionality reduction in representing relevant data from Hindi text using LSA. An empirical evaluation is performed to find the influence of language complexity and influence of various weighting schemes on dimensionality reduction. The results are presented using the standard measures such as recall, precision and F-score.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115561340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Varying or Unvarying-Logarithmic Quotient Model of Vowel Formants 元音共振峰的变或不变对数商模型
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.71
Xuewen Zhou
{"title":"Varying or Unvarying-Logarithmic Quotient Model of Vowel Formants","authors":"Xuewen Zhou","doi":"10.1109/IALP.2013.71","DOIUrl":"https://doi.org/10.1109/IALP.2013.71","url":null,"abstract":"This paper studies relations of F1, F2, and F3 of vowels, spoken at reading speed by 3 speakers of 2 languages (Yi and Mandarin Chinese). The results show that vowel Formants keep stable relation of Logarithmic Quotient (Z value, Z1=log F2/log F1, Z2=log F3/log F2). The ratio of Standard deviation and Average keeps below 3% for most vowels. Varying degree keeps below 3% for different speakers. This paper proves that Logarithmic Quotient is an ideal vowel-normalizing model and has potential applications in speech recognition and speech comparison.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129976481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice Rescoring 利用依存解析树和乔姆斯基分层格评分提高大词汇量连续语音识别器的准确率
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.53
Kai Sze Hong, T. Tan, E. Tang
{"title":"Improving the Accuracy of Large Vocabulary Continuous Speech Recognizer Using Dependency Parse Tree and Chomsky Hierarchy in Lattice Rescoring","authors":"Kai Sze Hong, T. Tan, E. Tang","doi":"10.1109/IALP.2013.53","DOIUrl":"https://doi.org/10.1109/IALP.2013.53","url":null,"abstract":"This research work describes our approaches in using dependency parse tree information to derive useful hidden word statistics to improve the baseline system of Malay large vocabulary automatic speech recognition system. The traditional approaches to train language model are mainly based on Chomsky hierarchy type 3 that approximates natural language as regular language. This approach ignores the characteristics of natural language. Our work attempted to overcome these limitations by extending the approach to consider Chomsky hierarchy type 1 and type 2. We extracted the dependency tree based lexical information and incorporate the information into the language model. The second pass lattice rescoring was performed to produce better hypotheses for Malay large vocabulary continuous speech recognition system. The absolute WER reduction was 2.2% and 3.8% for MASS and MASS-NEWS Corpus, respectively.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130616280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Research of the Modern Uyghur Data Analysis Technology 现代维吾尔语数据分析技术研究
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.39
Mengchen Pan, Xiangwei Qi, Weimin Pan
{"title":"Research of the Modern Uyghur Data Analysis Technology","authors":"Mengchen Pan, Xiangwei Qi, Weimin Pan","doi":"10.1109/IALP.2013.39","DOIUrl":"https://doi.org/10.1109/IALP.2013.39","url":null,"abstract":"With the development of our society, the languages are also constantly evolving. In order to master the word situation of modern Uyghur language, I regard modern Uyghur language data analysis technology as the study method, the standard Uyghur language textbooks frequency list of elementary and junior high school as the object of study, we can make a study of the word situation survey. In this article, first of all, introduces the theme types, theme source in the using corpus. Secondly, to state the algorithm research of modern Uyghur language data analysis system, Third I describe function of the modern Uyghur language data analysis software and working principle of each module. Forth, I regard the standard Uyghur language textbooks frequency list of elementary and junior high school as the object of study to validate the reliability and validity of frequency range analysis, coverage rate analysis and text number distribution analysis function of data analysis system. We obtained Ideal experimental results after the actual experiment. It provides advanced tools and techniques for the next step of modern Uyghur language further in-depth analysis study.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129173166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Research of Modern Uyghur Word Frequency Statistical Technology 现代维吾尔语词频统计技术研究
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.20
Azragul, Nianmei, Yasen Yimin
{"title":"Research of Modern Uyghur Word Frequency Statistical Technology","authors":"Azragul, Nianmei, Yasen Yimin","doi":"10.1109/IALP.2013.20","DOIUrl":"https://doi.org/10.1109/IALP.2013.20","url":null,"abstract":"With the development of our society, the languages are also constantly evolving. Word is the smallest meaningful language composition which able to activity independently, and is also important carrier of knowledge and the basic operation unit in the natural language processing system. Uyghur word frequency statistics technology is the process by computer automatic identification term boundary in the texts. It is the most important pretreatment of information processing technology. However, there is no a really mature Uighur word frequency statistics system, which became one of the bottlenecks that hampered the development of information processing in Uighur language seriously at present. This paper discusses the idea and algorithms of the Uyghur word frequency statistics system in detail. Secondly introduces functional design process of the word frequency statistics system. Third I describe methods and techniques of this system. Finally it states statement of the test results.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126270574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognizing Chinese Elementary Discourse Unit on Comma 汉语逗号基本语篇单元识别
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.8
Shengqin Xu, Peifeng Li
{"title":"Recognizing Chinese Elementary Discourse Unit on Comma","authors":"Shengqin Xu, Peifeng Li","doi":"10.1109/IALP.2013.8","DOIUrl":"https://doi.org/10.1109/IALP.2013.8","url":null,"abstract":"Element discourse unit (EDU) recognition is the primary task of discourse analysis. Chinese punctuation is viewed as a delimiter of elementary discourse units in Chinese. In this paper, we consider Chinese comma to be the boundary of the discourse units and also to anchor discourse relations between units separated by comma. We divide it into seven major types based on syntactic patterns and propose three different machine learning methods to automatically disambiguate the type of Chinese comma. The experimental results on Chinese Tree bank 6.0 show that our method outperforms the baseline.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121118535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Findings and Considerations in Active Learning Based Framework for Resource-Poor SMT 资源贫乏的SMT基于主动学习框架的发现与思考
2013 International Conference on Asian Language Processing Pub Date : 2013-08-17 DOI: 10.1109/IALP.2013.28
Jinhua Du, Meng Zhang
{"title":"Findings and Considerations in Active Learning Based Framework for Resource-Poor SMT","authors":"Jinhua Du, Meng Zhang","doi":"10.1109/IALP.2013.28","DOIUrl":"https://doi.org/10.1109/IALP.2013.28","url":null,"abstract":"Active learning (AL) for resource-poor SMT is an efficient and feasible way to acquire a number of high-quality parallel data to improve translation quality. This paper firstly studies two mainstream sentence selection algorithms that are Geom-phrase and Geom n-gram, and then proposes a sentence perplexity based selection method. Some important findings, such as the impact of sentence length on the AL performance, are observed in the comparison experiments conducted on Chinese-English NIST data. Accordingly, a preprocessing strategy is presented to filter the original monolingual corpus for the purpose of obtaining higher-information sentences. Experimental results on preprocessed data show that the the performance of three selection algorithms is significantly improved compared to the results on the original data.","PeriodicalId":413833,"journal":{"name":"2013 International Conference on Asian Language Processing","volume":"49 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133185567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信