{"title":"Exploring Both Flat and Structured Features for Number Type Identification of Chinese Personal Noun Phrases","authors":"Jun Lang","doi":"10.1109/IALP.2011.69","DOIUrl":"https://doi.org/10.1109/IALP.2011.69","url":null,"abstract":"Different from English, Chinese does not explicitly show grammatical number information by inflection. The Number information in a Chinese sentence is implied by the noun phrase itself and its surrounding context. In this paper, we explore diverse features, including both flat and structured, for number identification of Chinese personal noun phrase. The flat features explore the knowledge within the noun phrase while the structured features capture the surrounding context information of the noun phrase in the parse tree of the given sentence. These two kinds of features together with kernel-based SVM are utilized in this study. Evaluation on the ACE 2005 corpus shows that our method achieves 89.23% in accuracy, which significantly advances the state-of-the-art.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126208612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining Parallel Data from Comparable Corpora via Triangulation","authors":"T. Do, E. Castelli, L. Besacier","doi":"10.1109/IALP.2011.57","DOIUrl":"https://doi.org/10.1109/IALP.2011.57","url":null,"abstract":"This paper improves an unsupervised method for extracting parallel sentence pairs from a comparable corpus by using the triangulation through a third language. Before, an unsupervised method for extracting parallel sentence pairs from a comparable corpus has been proposed. This method is based on technique of cross-language information retrieval with iterative process and requires no more additional parallel data. The method has been validated on the Vietnamese-French and Vietnamese-English bilingual data. In this paper, we address the problem of using triangulation through a third language to improve the parallel data mining processes: English is used in the Vietnamese-French parallel data mining process, and French is used in the Vietnamese-English parallel data mining process. The experiments conducted show that using triangulation can improve the quality of the extracted data and the quality of the translation system as well.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114912692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong
{"title":"Improving Chinese Dependency Parsing with Self-Disambiguating Patterns","authors":"Likun Qiu, Lei Wu, Kai Zhao, Changjian Hu, Lingpeng Kong","doi":"10.1109/IALP.2011.36","DOIUrl":"https://doi.org/10.1109/IALP.2011.36","url":null,"abstract":"To solve the data sparseness problem in dependency parsing, most previous studies used features constructed from large-scale auto-parsed data. Unlike previous work, we propose a new approach to improve dependency parsing with context-free dependency triples (CDT) extracted by using self-disambiguating patterns (SDP). The use of SDP makes it possible to avoid the dependency on a baseline parser and explore the influence of different types of substructures one by one. Additionally, taking the available CDTs as seeds, a label propagation process is used to tag a large number of unlabeled word pairs as CDTs. Experiments show that, when CDT features are integrated into a maximum spanning tree (MST) dependency parser, the new parser improves significantly over the baseline MST parser. Comparative results also show that CDTs with dependency relation labels perform much better than CDT without dependency relation label.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117113707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Experimental Study on Vietnamese Speech Synthesis","authors":"Liping Kui, Jian Yang, Bin He, Enxing Hu","doi":"10.1109/IALP.2011.40","DOIUrl":"https://doi.org/10.1109/IALP.2011.40","url":null,"abstract":"The modern Vietnamese is a monosyllabic tone language. Each syllable can be marked with initial, final and tone. In this paper, Vietnamese speech synthesis system is realized by using a trainable HMM-based speech synthesis method. The basic synthesis units of this system are initials and finals. According to the characteristics of Vietnamese, we have conducted such works as collecting corpus, recording, labeling, determining the phonemes list, and designing context attributes and question set. Then Vietnamese speech synthesis system is constructed by using the STRAIGHT synthesizer under the HTS platform. At last, we conduct a subjective test to synthetic speech signals. The results of preliminary evaluation show that the intelligibility of the utterances is approximately 100%, and the quality of synthesis speech is from fair to good.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"16 11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard
{"title":"How Vietnamese Attitudes can be Recognized and Confused: Cross-Cultural Perception and Speech Prosody Analysis","authors":"Dang-Khoa Mac, E. Castelli, V. Aubergé, A. Rilliard","doi":"10.1109/IALP.2011.39","DOIUrl":"https://doi.org/10.1109/IALP.2011.39","url":null,"abstract":"Prosodic attitudes, or social affects, are main part of face-to-face interaction and linked to the language through the culture. This paper presents a study on prosodic attitudes in Vietnamese, a tonal language. Perception experiments on 16 Vietnamese attitudes were carried out with Vietnamese and French participants. The results revealed perception differences between native and non-native listeners. As attitudinal expression are partially carried through speech prosody, an analysis was also carried out, in order to have a better understanding of why these attitudes are recognized or confused, and to bring out some prosodic characteristics of Vietnamese social affects.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117272153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lexical Word Similarity for Re-ranking in Vietnamese-English Named Entity Back Transliteration","authors":"Diem Thi Hoang Le, AiTi Aw","doi":"10.1109/IALP.2011.44","DOIUrl":"https://doi.org/10.1109/IALP.2011.44","url":null,"abstract":"Transliteration is the transformation of word in original language to another language based on its pronunciation. Back transliteration is the transformation of already transliterated word in another language back to its original form. This backward process is in nature more challenging than the forward direction because of more information lost. In many cases, the back transliteration can return almost exact result, which has a minor difference in spelling compared with the original word form. We propose in this work a lexical word similarity for dictionary matching in order to re-rank the candidates and enhance the performance of a grapheme-based location name back transliteration. This method is experimented on Vietnamese-English language pair and showed improvement.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134564636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Query Reformulation Model Using Markov Graphic Method","authors":"Jiali Zuo, Mingwen Wang","doi":"10.1109/IALP.2011.62","DOIUrl":"https://doi.org/10.1109/IALP.2011.62","url":null,"abstract":"Information retrieval model is still can not achieve satisfactory performance after decades of development. One of the reasons is the queries can not express information need precisely. Researches have shown that query reformulation can improve the performance of retrieval model. In this paper, we propose a query reformulation model, which use Markov network to represent term relationship to obtain useful information from corpus to reformulate query. Experimental results show that our model can avoid topic drift and then improve the retrieval performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115014669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman
{"title":"Sentence Boundary Detection in Colloquial Arabic Text: A Preliminary Result","authors":"A. Al-Subaihin, Hend Suliman Al-Khalifa, A. Al-Salman","doi":"10.1109/IALP.2011.38","DOIUrl":"https://doi.org/10.1109/IALP.2011.38","url":null,"abstract":"Recently, natural language processing tasks are more frequently conducted over online content. This poses a special problem for applications over Arabic language. Online Arabic content is usually written in informal colloquial Arabic, which is characterized to be ill-structured and lacks specific linguistic standardization. In this paper, we investigate a preliminary step to conduct successful NLP processing which is the problem of sentence boundary detection. As informal Arabic lacks basic linguistic rules, we establish a list of commonly used punctuation marks after extensively studying a large amount of informal Arabic text. Moreover, we evaluated the correct usage of these punctuation marks as sentence delimiters; the result yielded a preliminary accuracy of 70%.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121480948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Issues with the Unergative/Unaccusative Classification of the Intransitive Verbs","authors":"Nitesh Surtani, Khushboo Jha, Soma Paul","doi":"10.1109/IALP.2011.54","DOIUrl":"https://doi.org/10.1109/IALP.2011.54","url":null,"abstract":"The paper abandons a strict two-way sub-classification of intransitive verbs into unaccuasative and unergative for Hindi and proposes a distribution plotting of the same in a diffusion chart. The diagnostics tests that Bhatt (2003) applied on Hindi data are ranked for their efficiency of attributing correct sub-class to verbs. The diffusion chart shows that a tripartite classification handles the issue of classification of intransitive verbs in a better manner than the classical binary approach. The tripartite classification is as follows: (1) Verbs that take animate subject and are compatible with adverb of volitionality; (2) Verbs that take animate subject but are not compatible with adverb of volitionality; and (3) Verbs that take inanimate subject. The classification is of immense advantage for various NLP tasks such as machine translation, natural language generation.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115897276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sophia Yat-Mei Lee, Daming Dai, Shoushan Li, K. Ahrens
{"title":"Extracting Pseudo-Labeled Samples for Sentiment Classification Using Emotion Keywords","authors":"Sophia Yat-Mei Lee, Daming Dai, Shoushan Li, K. Ahrens","doi":"10.1109/IALP.2011.61","DOIUrl":"https://doi.org/10.1109/IALP.2011.61","url":null,"abstract":"Sentiment and emotion analysis have been traditionally established as independent research topics in NLP. Although they are two important aspects of subjective information and are closely related, there have been few attempts to combine the two analyses. As a preliminary attempt, we integrate emotion information into sentiment analysis by employing emotion keywords to help automatically extract pseudo-labeled samples. The extracted pseudo-labeled samples are then used as the initial training data to perform semi-supervised learning for sentiment classification. Experimental results across four domains show that our approach using emotion keywords is capable of extracting pseudo-labeled samples with high precision (about 90%). Moreover, the pseudo-labeled samples along with the semi-supervised learning approach further improve the classification performance.","PeriodicalId":297167,"journal":{"name":"2011 International Conference on Asian Language Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126641534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}