2009 International Conference on Asian Language Processing最新文献

筛选
英文 中文
Sogou Query Log Analysis: A Case Study for Collaborative Recommendation or Personalized IR 搜狗查询日志分析:协同推荐或个性化IR的案例研究
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.72
Zhitao Zhang, Muyun Yang, Sheng Li, Haoliang Qi, Chao Song
{"title":"Sogou Query Log Analysis: A Case Study for Collaborative Recommendation or Personalized IR","authors":"Zhitao Zhang, Muyun Yang, Sheng Li, Haoliang Qi, Chao Song","doi":"10.1109/IALP.2009.72","DOIUrl":"https://doi.org/10.1109/IALP.2009.72","url":null,"abstract":"Through analyzing the search engine logs, we can better understand the law o users’ search behavior, mining users’ personality, so that improving the performances of web information retrieval. This paper analyzes the user, query, clickthrough data of Sogou, a large-scale Chinese search engine. We focus on the relation of user, query and URL, revealing some new characteristic of the Web user. The result shows that the portal websites are visited most frequently. The average user of Sogou clicks 4.82 URL, including 1.72 distinct URL. This paper demonstrates the necessity of personalized information retrieval, which is enlightening for improving the performance of Chinese search engine.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130180507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
The Improved Logistic Regression Models for Spam Filtering 垃圾邮件过滤的改进逻辑回归模型
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.74
Yong Han, Muyun Yang, Haoliang Qi, Xiaoning He, Sheng Li
{"title":"The Improved Logistic Regression Models for Spam Filtering","authors":"Yong Han, Muyun Yang, Haoliang Qi, Xiaoning He, Sheng Li","doi":"10.1109/IALP.2009.74","DOIUrl":"https://doi.org/10.1109/IALP.2009.74","url":null,"abstract":"The logistic regression model has achieved success in spam filtering. But it is disadvantaged by the equal adjustment of the feature weights appeared in both spam messages and ham ones during training period. This paper presents an improved logistic regression model which reduces the impact of the features appearing in both spam messages and ham ones. Byte level n-grams are employed to extract the features from messages, and TONE (Train On or Near Error) is adopted, which are proved effective in state-of-the-art spam filtering system. The official runs of CEAS (Conference on Email and Anti-Spam) Spam-filter Challenge 2008 show that the proposed model is one of the best methods. Our system achieved competitive results in all tasks and is the winner of active learning on the live stream by 1- ROCA.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128849859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
BEST Corpus Development and Analysis BEST语料库开发和分析
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.76
M. Boriboon, Kanyanut Kriengket, P. Chootrakool, Sitthaa Phaholphinyo, Sumonmas Purodakananda, T. Thanakulwarapas, K. Kosawat
{"title":"BEST Corpus Development and Analysis","authors":"M. Boriboon, Kanyanut Kriengket, P. Chootrakool, Sitthaa Phaholphinyo, Sumonmas Purodakananda, T. Thanakulwarapas, K. Kosawat","doi":"10.1109/IALP.2009.76","DOIUrl":"https://doi.org/10.1109/IALP.2009.76","url":null,"abstract":"This document describes the development process of the BEST 2009 word segmented-corpus. It is the first corpus to benchmark Thai word segmentation software. The corpus is composed of four genres, namely, collection of news, novels, encyclopedia, and academic articles. It contains 509 files. Its length is 64.1 MB. There are 5,036,229 tokens with 83,027 unique tokens. Common tokens appearing in all genres are 4,556 tokens. They covered 85.13% of the corpus. The highest frequency token in the corpus is ¿¿¿ /thi2/. The first 50 frequency tokens cover 37.65% of the corpus. About 50% of the corpus compose of the first 119 high frequency tokens. All tokens are grouped into 8 categories. Except for Thai spelling category, the other categories play different major parts in specific genres.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126962311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Challenges in Developing Persian Corpora from Online Resources 利用网络资源开发波斯语语料库的挑战
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.31
Masood Ghayoomi, S. Momtazi
{"title":"Challenges in Developing Persian Corpora from Online Resources","authors":"Masood Ghayoomi, S. Momtazi","doi":"10.1109/IALP.2009.31","DOIUrl":"https://doi.org/10.1109/IALP.2009.31","url":null,"abstract":"Persian is one of the Indo-European languages which has borrowed its script from Arabic, a member of Semitic language family. Since Persian and Arabic scripts are so similar, problems arise when we want to process an electronic text. In this paper, some of the common problems faced experimentally in developing a corpus for Persian from on-line materials are discussed. The sources of the problems are the Persian script itself; mixture with the Arabic script; Persian orthography; the typists’ typing styles; and mixing Persian code pages with Arabic code pages in operating systems.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127847605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Modular Cascaded Approach to Complete Parsing 完成解析的模块化级联方法
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.37
Samar Husain, Phani Gadde, Bharat Ram Ambati, D. Sharma, R. Sangal
{"title":"A Modular Cascaded Approach to Complete Parsing","authors":"Samar Husain, Phani Gadde, Bharat Ram Ambati, D. Sharma, R. Sangal","doi":"10.1109/IALP.2009.37","DOIUrl":"https://doi.org/10.1109/IALP.2009.37","url":null,"abstract":"In this paper, we propose a modular cascaded approach to data driven dependency parsing. Each module or layer leading to the complete parse produces a linguistically valid partial parse. We do this by introducing an artificial root node in the dependency structure of a sentence and by catering to distinct dependency label sets that reflect the function of the set internal labels vis-à-vis a distinct and identifiable linguistic unit, at different layers. The linguistic unit in our approach is a clause. Output (partial parse) from each layer can be accessed independently. We applied this approach to Hindi, a morphologically rich free word order language using MST Parser. We did all our experiments on a part of Hyderabad Dependency Treebank. The final results show an increase of 1.35% in unlabeled attachment and 1.36% in labeled attachment accuracies over state-of-the-art data driven Hindi parser.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127740645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Multilingual Multimodal Integration of Sketch and Speech: A Generic Speech Representation Model for Spatial Description 多语言多模态草图与语音的整合:空间描述的通用语音表示模型
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.13
L. Teh, A. Yeo
{"title":"Multilingual Multimodal Integration of Sketch and Speech: A Generic Speech Representation Model for Spatial Description","authors":"L. Teh, A. Yeo","doi":"10.1109/IALP.2009.13","DOIUrl":"https://doi.org/10.1109/IALP.2009.13","url":null,"abstract":"This paper details how multiple languages are accommodated in the multimodal integration of sketch and speech, specifically, in spatial applications. The study encompasses English, Malay, Mandarin, and two under-resourced languages in Malaysia, i.e. Melanau and Iban. The preliminary study revealed that not all spatial terms (prepositions) appear in all languages. Based on these findings, we propose a method to assist in the design and development of multilingual multimodal applications. This method employs a generic representation model for spatial description.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"609 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127604802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the Effects of Text Clustering on On-Line Military News Based on Quantitative Association Rule 基于定量关联规则的在线军事新闻文本聚类效果研究
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.48
Liang-Chu Chen, Chyi-Bao Yang, Jih-Hsin Chen, Yen-Hsuan Lien
{"title":"Exploring the Effects of Text Clustering on On-Line Military News Based on Quantitative Association Rule","authors":"Liang-Chu Chen, Chyi-Bao Yang, Jih-Hsin Chen, Yen-Hsuan Lien","doi":"10.1109/IALP.2009.48","DOIUrl":"https://doi.org/10.1109/IALP.2009.48","url":null,"abstract":"Text clustering is an automatic technique to group texts using the approach of feature extraction and term connection to calculate the similarities among subject contents of texts. Since the properties of terms in Chinese text (e.g. segmentation and annotation) are not as clear as the other languages, extracting and distinguishing features from Chinese text is therefore much more difficult, which greatly impacts the effects of clustering. From the perspective of military news, this paper applies both quantitative association rule and hierarchical agglomerative algorithm to cluster Chinese news published in Youth Daily News, and the application results are compared with those by the traditional vector space model approach and by the general association rule approach, respectively. F-measure is used as evaluation metric in the experiments. Experimental results show that the quantitative association rule approach performs more accurately than both the vector space model and association rule in text automatic clustering.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133909795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Semantic Genes and the Semantic Composition of Adjectives in Modern Chinese 现代汉语形容词的语义基因与语义构成
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.61
Dan Hu, Jinglian Gao
{"title":"Semantic Genes and the Semantic Composition of Adjectives in Modern Chinese","authors":"Dan Hu, Jinglian Gao","doi":"10.1109/IALP.2009.61","DOIUrl":"https://doi.org/10.1109/IALP.2009.61","url":null,"abstract":"Words cluster semantically together by commonly sharing semantic genes. By inheritance, recombination and variation of semantic genes, new words are produced. The semantics of adjective is composed of core semantic genes and attribute semantic genes. With these genes and the semantic composition formula, we can construct a semantic knowledge-base of adjectives accurately for NLP.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122859959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vietnamese Final Stop Consonants /p, t, k/ Described in Terms of Formant Transition Slopes 越南语终音辅音/p, t, k/用形成峰过渡斜率描述
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.27
Viet Son Nguyen, E. Castelli, R. Carré
{"title":"Vietnamese Final Stop Consonants /p, t, k/ Described in Terms of Formant Transition Slopes","authors":"Viet Son Nguyen, E. Castelli, R. Carré","doi":"10.1109/IALP.2009.27","DOIUrl":"https://doi.org/10.1109/IALP.2009.27","url":null,"abstract":"It is well known that bursts and voiced formant transitions serve as separate cues to the place of articulation of initial stop consonants. The Vietnamese presents three final voiceless stop consonants /p, t, k/ without bursts. It is an opportunity to study these final stop consonants and to compare their characteristics with those of the corresponding initial stop consonants. As final consonants were never studied before, this paper analyses the vowel-consonant (VC) and consonant-vowel-consonant (CVC) productions in terms of the transition duration, the starting formant transition values and the slopes of the VC transitions. Measurements have shown that in the same preceding vowel contexts, the three final stop consonants /p, t, k/ are always clearly different by at least one of the three slopes of F1, F2, and F3. These final stop consonants can also be differentiated in the locus equation space. The results also pointed out the effects of the final consonants on either long vowels or short vowels. This explains why Vietnamese could not pronounce the short vowels in isolation.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125792250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Repetition in Mandarin Interaction: A Case Study on TV Shopping Channels in Taiwan 普通话互动中的重复:以台湾电视购物频道为例
2009 International Conference on Asian Language Processing Pub Date : 2009-12-07 DOI: 10.1109/IALP.2009.18
Fuhui Hsieh, Ying Liang
{"title":"Repetition in Mandarin Interaction: A Case Study on TV Shopping Channels in Taiwan","authors":"Fuhui Hsieh, Ying Liang","doi":"10.1109/IALP.2009.18","DOIUrl":"https://doi.org/10.1109/IALP.2009.18","url":null,"abstract":"Repetition is a pervasive type of spontaneous prepatterning in conversation. From an evolutionary perspective, repetition or imitation is a safe way to secure oneself from stepping into any danger caused by uncertainty. By repeating or imitating the behavior of other group members, one may survive in many situations. From a learning or pedagogical perspective, repetition or imitation is a fast way to acquire a skill or a language, including the lexicon and the structures. The main purpose of this paper is to investigate this significantly pervasive yet somewhat neglected phenomenon in Mandarin discourse. In this study, we seek to examine repetitions in social interactions on TV shopping channels in Taiwan. It is hoped that such a study may contribute to natural language processing and information processing by providing a detailed analysis of the patterns and functions of repetition in social interactions.","PeriodicalId":156840,"journal":{"name":"2009 International Conference on Asian Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116110062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信