{"title":"Refining Unit Boundaries for Mandarin Text-to-Speech Database","authors":"M. Dong, Ling Cen, P. Chan, Haizhou Li","doi":"10.1109/IALP.2009.59","DOIUrl":"https://doi.org/10.1109/IALP.2009.59","url":null,"abstract":"In unit selection based Text-to-Speech (TTS) synthesis, the accurate position of the unit boundaries in the unit selection database is one of the factors that determine the quality of the synthesized speech. To ensure the accuracy of the boundary positions, developers often have to manually verify the speech boundaries that are generated by automatic speech recognition techniques. In order to reduce the manual workload, it is necessary to use automatic methods of refining the position of the unit boundaries. This paper proposes a frame-shift method to find the globally optimal joint position for unit concatenation between any two matching units. Experiment results show that this method can improve the boundary accuracy compared to manual labeling.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121122195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Study on the Classification of Mixed Text Based on Conceptual Vector Space Model and Bayes","authors":"Yaxiong Li, Dan Hu","doi":"10.1109/IALP.2009.64","DOIUrl":"https://doi.org/10.1109/IALP.2009.64","url":null,"abstract":"Traditional vector-space-based text-classification models are established by calculating the weights of feature words on the lexical level. In such models, words are independent on one another and their semantic relations are unrevealed. This paper proposes a vector-space-based text analyzer by introducing conceptual semantic similarity into traditional vector-space-based models. Naive Bayes classification technology is also adopted into this new analyzer. Experiment results indicate that the new analyzer can improve text classification.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123559265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Naidu, Anil Kumar Singh, D. Sharma, Akshar Bharati
{"title":"Improving the Performance of the Link Parser","authors":"V. Naidu, Anil Kumar Singh, D. Sharma, Akshar Bharati","doi":"10.1109/IALP.2009.53","DOIUrl":"https://doi.org/10.1109/IALP.2009.53","url":null,"abstract":"The paper describes an approach to extend the coverage of a Link Grammar based parser on the constructions that are not being handled currently by the grammar. There are about thirty types of constructions which we have identified till now. In order to make Link Grammar handle these constructions, we introduce a preprocessor and a postprocessor. The idea is to handle such constructions via some analysis and transformations in a preprocessing phase before the sentence is given to the Link Parser and then by adding the missing links in the postprocessing phase. The main part of the paper discusses the constructions not handled by the parser and introduces rule based preprocessor and postprocessor. This simple and flexible approach is able to increase the coverage of the parser significantly and allows even a relatively naive user to improve the performance of the parser without disturbing the core grammar.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129078602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Experimental Study on Vietnamese POS Tagging","authors":"Oanh T. K. Tran, C. Le, Quang-Thuy Ha, Quynh Lê","doi":"10.1109/IALP.2009.14","DOIUrl":"https://doi.org/10.1109/IALP.2009.14","url":null,"abstract":"In Natural Language Processing (NLP), Part-of-speech tagging is one of the important tasks. It, however, has not drawn much attention of Vietnamese researchers all over the world. In this paper, we present an experimental study on Vietnamese POS tagging. Motivated from Chinese research and Vietnamese characteristics, we present a new kind of features based on the idea of word composition. We call it morpheme based features. To verify the effectiveness of these features, we use three powerful machine learning techniques - MEM, CRF and SVM. In addition, we also built a Vietnamese POS-tagged corpus with approximately 8000 sentences of different genres to conduct experiments. Experimental results showed that morpheme-based features always give higher precision in comparison with previous approaches - usually word-based features. We achieved the precision of 91.64% by using these morpheme-based features.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116190943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy Sensitivity: Application in Arabic","authors":"S. Al-Fedaghi, Faisal Alhaqan","doi":"10.1109/IALP.2009.40","DOIUrl":"https://doi.org/10.1109/IALP.2009.40","url":null,"abstract":"Personal Identifiable Information (PII) describes a relationship between information and a uniquely identifiable person. Sensitive PII refers to a category of PII that contains significant information about individuals. In general, sources of sensitivity of PII can be tracked by partitioning the basic unit of linguistic information into three parts: identity, verb, and the reminder of the linguistic construct. In this paper, we analyze the anatomy of PII with respect to its sensitivity and apply it to Arabic. The paper reports on an experimental system that uses such a method.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134151560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Domain-Ontology Relation Extraction from Semi-structured Texts","authors":"Cheng Xiao, Dequan Zheng, Yuhang Yang, G. Shao","doi":"10.1109/IALP.2009.51","DOIUrl":"https://doi.org/10.1109/IALP.2009.51","url":null,"abstract":"This paper presents a new method to acquire Domain-Ontology relations from semi-structured data sources. First, obtain Web documents according to the co-occurrence of concept instance and attribute value. Further, define formats of relation patterns, and extract pattern instances from Web documents, including pattern clustering and pattern combining in each cluster. Finally, relation pattern instances are applied to gain attribute values of new concept instances in Domain-Ontology. Experiments are carried out in the field of film, the rate of pattern incorrect-division and pattern leakage are respectively 0.19% and 1.31%, the highest precision of combined relation patterns reaches 85%. Experimental results demonstrate that the method developed in this paper is fairly efficient.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"248 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134278483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Chinese Essay Scoring Using Connections between Concepts in Paragraphs","authors":"Tao-Hsing Chang, Chia-Hoang Lee","doi":"10.1109/IALP.2009.63","DOIUrl":"https://doi.org/10.1109/IALP.2009.63","url":null,"abstract":"Automatic essay scoring (AES) system is a very important research tool for educational studies. Many studies indicate that current AES systems should be able to analyze semantic characteristics of an essay and include more such features to score essays. This study proposes a novel method which uses the similarity between the paragraphic conceptual connections in different essays to predict the scores of essays. Preliminary experiments show that the paragraphic conceptual structure in an essay can be an efficient feature for scoring the essay.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124299658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Quan Zhou, Pan Deng, Hongjian Liu, Defeng Guo, Kenji Nagamatsu
{"title":"A Hybrid Method of Chinese Prosodic Word Tagging Based on Keyword Anchor and Hidden Markov Model","authors":"Quan Zhou, Pan Deng, Hongjian Liu, Defeng Guo, Kenji Nagamatsu","doi":"10.1109/IALP.2009.24","DOIUrl":"https://doi.org/10.1109/IALP.2009.24","url":null,"abstract":"In this paper, a new method of Chinese prosodic word tagging is presented. This method consists of a rule-based algorithm named “Keyword Anchor” and a statistical algorithm based on Hidden Markov Model (HMM). For keyword anchor algorithm, an anchor of the prosodic word is defined to help the system to find the whole prosodic word. For statistical algorithm, a length-based Hidden Markov Model (HMM) is used to find the best result of prosodic word tagging. The experiments of this method prove the better result than preceding methods in this field. The “Open Set F Score” of prosodic word based on this method is up to about 0.96.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"221 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124378090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Role Based Tamil Sentence Generator","authors":"S. Lakshmana Pandian, T.V. Geetha","doi":"10.1109/IALP.2009.26","DOIUrl":"https://doi.org/10.1109/IALP.2009.26","url":null,"abstract":"A Machine learning technique called memory based bigram models are developed for a system that generate a simple sentence for Tamil language from a set of concept terms and their semantic role. This system consists of a learner to learn how to realize a sentence from the content of semantic role information. This learner has been designed as a statistical model that is formulated from a preprocessed corpus of sentences. This preprocessing work is handled by annotating the corpus using part of speech tagging, chunking and semantic role labeling processes. This collective annotated corpus is statistically analyzed and developed the memory based bigram models. These models thus obtained are capable of producing the appropriate sequence of semantic roles of the concept terms for realizing sentence. A phrase generator has been developed to generate the appropriate phrases involved in sentence generation.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114607952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building Thai WordNet with a Bi-directional Translation Method","authors":"Dhanon Leenoi, T. Supnithi, Wirote Aroonmanakun","doi":"10.1109/IALP.2009.19","DOIUrl":"https://doi.org/10.1109/IALP.2009.19","url":null,"abstract":"This research presents a method of building Thai WordNet using an automatic bi-directional translation system with two EnglishThai dictionaries, LEXiTRON and Thiengburanathum Dictionary. The former was compiled using a corpus-based approach, whilst the latter was compiled on the basis of the author’s expertise. The results show that using LEXiTRON gives an F-measure of 50.36 for synset aspect, and 25.01 for word aspect, while using the Thiengburanathum Dictionary results in F-measure of 64.51 for synset aspect and 34.54 for word aspect. Furthermore, for a combination of two dictionaries, the F-measure increases to 67.16 for synset aspect and 36.27 for word aspect.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116875301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}