Prissana Akaraputthiporn, K. Kosawat, Wirote Aroonmanakun
{"title":"A Bi-directional Translation Approach for Building Thai Wordnet","authors":"Prissana Akaraputthiporn, K. Kosawat, Wirote Aroonmanakun","doi":"10.1109/IALP.2009.29","DOIUrl":"https://doi.org/10.1109/IALP.2009.29","url":null,"abstract":"In this paper we introduce a bi-directional translation approach for building Thai WordNet automatically. The 2nd Order Entity of common bases concepts were selected as the target for constructing Thai WordNet in this study. Manual construction was carried out to set up a gold standard for evaluating the bi-directional translation approach as well as other automatic Thai WordNet construction methods. The bi-directional translation method was found to be good for precision but not recall. Issues relating to the number of word senses, whether it is monosemic or polysemic, and the relation between source and target words, whether it is 1:1, 1:many, many:1, or many:many, were investigated.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128339778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Learning of Domain-Specific Language Models from General Domain Data","authors":"Shuanhu Bai, Min Zhang, Haizhou Li","doi":"10.1109/IALP.2009.65","DOIUrl":"https://doi.org/10.1109/IALP.2009.65","url":null,"abstract":"We present a semi-supervised learning method for building domain-specific language models (LM) from general-domain data. This method is aimed to use small amount of domain-specific data as seeds to tap domain-specific resources residing in larger amount of general-domain data with the help of topic modeling technologies. The proposed algorithm first performs topic decomposition (TD) on the combined dataset of domain-specific and general-domain data using probabilistic latent semantic analysis (PLSA). Then it derives domain-specific word n-gram counts with mixture modeling scheme of PLSA. Finally, it uses traditional n-gram modeling approach to construct domain-specific LMs from the domain-specific word n-gram counts. Experimental results show that this approach can outperform both stat-of-the-art methods and the simulated supervised learning method with our data sets. In particular, the semi-supervised learning method can achieve better performance even with very small amount of domain-specific data.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122753511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Sun, Lei Lin, Bingquan Liu, Chengjie Sun, Xiaolong Wang
{"title":"Foxinfo1.0: A Chinese Topic-Oriented Search Engine","authors":"Ke Sun, Lei Lin, Bingquan Liu, Chengjie Sun, Xiaolong Wang","doi":"10.1109/IALP.2009.28","DOIUrl":"https://doi.org/10.1109/IALP.2009.28","url":null,"abstract":"Topic-oriented search engine (topic-search) is a new IR service which provides compounded types of information with certain user queried topic in one page. It firstly categorizes user query into a certain domain, and then organizes several types of information based on the query keywords into a magazine-style topic page for user. In this paper, we propose a Chinese topic-oriented search engine service, named as Foxinfo1.0, which provides a 360 degree view of the topic that interests the users who're seeking it. Different from the original topic-search which employs the keyword-based topic to retrieve and aggregate relevant information, Foxinfo1.0 could organize information from different abstraction level by employing tag to describe the topics of queries and information. Further, in order to predict tags from queries and web documents, a tag prediction algorithm named as CTAG is proposed, which could concern tags from different levels, and performs much better than the baseline method like AutoTag.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123444318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study on Eye Movement of Korean Students Reading Chinese Texts with or without Marks for Word Boundaries","authors":"Yu Peng, Liu Su","doi":"10.1109/IALP.2009.83","DOIUrl":"https://doi.org/10.1109/IALP.2009.83","url":null,"abstract":"By comparing the eye movement data of Korean students reading Chinese texts with or without marks for word boundaries, the researcher has found that: (1) To Korean students, the Chinese words represent more psychological reality than Chinese characters. (2) The reading efficiency of Korean students could be immensely improved by inserting word boundary marks. It is, therefore, suggested that when publishing Chinese textbooks intended for Korean students, we adjust the way we do type-setting today and insert word boundary marks into the text.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128768503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}