2015 International Conference on Asian Language Processing (IALP)最新文献

筛选
英文 中文
Toward better keywords extraction 更好的关键词提取
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451561
Shihua Xu, Fang Kong
{"title":"Toward better keywords extraction","authors":"Shihua Xu, Fang Kong","doi":"10.1109/IALP.2015.7451561","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451561","url":null,"abstract":"Automatic keyword extraction is the task to identify a small set of keywords from a given document that can describe the meaning of the document. It plays an important role in information retrieval. In this paper, a clustering-based approach to do this task is proposed. And the impacts of keyword length, the window size of centroid on the performance of AKE system are discussed. Then by introducing keyword length constraint and extending the number of centroid of every cluster, the performance of our AKE system is improved by 7.5% in F-score.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"256 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132044345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mongolian Named Entity Recognition using suffixes segmentation 基于后缀分割的蒙古语命名实体识别
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451558
Weihua Wang, F. Bao, Guanglai Gao
{"title":"Mongolian Named Entity Recognition using suffixes segmentation","authors":"Weihua Wang, F. Bao, Guanglai Gao","doi":"10.1109/IALP.2015.7451558","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451558","url":null,"abstract":"Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates Mongolian NER system under three methods in the Condition Random Field framework. The experiment shows that segmenting each suffix into an individual token achieves the best performance than both without segmenting and using the suffixes as a feature. Our approach obtains an F-measure = 82.71. It is appropriate for the Mongolian large scale vocabulary NER. This research also makes sense to other agglutinative languages NER systems.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"26 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120906958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Weighted Document Frequency for feature selection in text classification 基于加权文档频率的文本分类特征选择
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451549
Baoli Li, Q. Yan, Zhenqiang Xu, Guicai Wang
{"title":"Weighted Document Frequency for feature selection in text classification","authors":"Baoli Li, Q. Yan, Zhenqiang Xu, Guicai Wang","doi":"10.1109/IALP.2015.7451549","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451549","url":null,"abstract":"In the past research, Document Frequency (DF) has been validated to be a simple yet quite effective measure for feature selection in text classification. The calculation is based on how many documents in a collection contain a feature, which can be a word, a phrase, a n-gram, or a specially derived attribute. The counting process takes a binary strategy: if a feature appears in a document, its DF will be increased by one. This traditional DF metric concerns only about whether a feature appears in a document, but does not consider how important the feature is in that document. Obviously, thus counted document frequency is very likely to introduce much noise. Therefore, a weighted document frequency (WDF) is proposed and expected to reduce such noise to some extent. Extensive experiments on two text classification datasets demonstrate the effectiveness of the proposed measure.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121975415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Construction of Japanese semantically compatible words resource 日语语义兼容词资源的构建
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451532
Kazuhide Yamamoto, Kanji Takahashi
{"title":"Construction of Japanese semantically compatible words resource","authors":"Kazuhide Yamamoto, Kanji Takahashi","doi":"10.1109/IALP.2015.7451532","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451532","url":null,"abstract":"We have constructed a Japanese semantically compatible resource and attached it to a dictionary used in our language analyzer, which segments text into words. We expect that semantically compatible words solve the data sparseness problem of corpus-based Natural Language Processing. By grouping compatible words together, the amount of words to process can be much reduced. In this study, we define hyponymy-and-hypernymy relation groups and synonym groups as semantically compatible words. The semantically compatible resource contains 343 concepts as hyponymy-and-hypernymy relation groups and 21,784 concepts as synonymy groups. We can obtain semantically compatible words from a Japanese word analyzer, SNOWMAN. The constructed resource will be available to the public.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129682752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards improving the performance of Vector Space Model for Chinese Frequently Asked Question Answering 提高中文常见问题回答的向量空间模型的性能
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451550
Ridong Jiang, Seokhwan Kim, Rafael E. Banchs, Haizhou Li
{"title":"Towards improving the performance of Vector Space Model for Chinese Frequently Asked Question Answering","authors":"Ridong Jiang, Seokhwan Kim, Rafael E. Banchs, Haizhou Li","doi":"10.1109/IALP.2015.7451550","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451550","url":null,"abstract":"This paper presents a method which improves the performance of Vector Space Model (VSM) when applying it to Chinese Frequently Asked Questions (FAQ). This method combines unigram and bigram models in determining the similarity of document vectors. The performance is further improved by applying shallow lexical semantics and the document length information. Experiments showed that the proposed methods outperform baselines (segmentation and bigram) across different datasets which include FAQs from restricted domains and open domains.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116865583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A comparative study on collectives of term weighting methods for extractive presentation speech summarization 主题词加权集体方法在抽取演讲摘要中的比较研究
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451553
Jian Zhang, Huaqiang Yuan
{"title":"A comparative study on collectives of term weighting methods for extractive presentation speech summarization","authors":"Jian Zhang, Huaqiang Yuan","doi":"10.1109/IALP.2015.7451553","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451553","url":null,"abstract":"This paper presents a comparative study of collectives of term weighting methods for extractive speech summarization of Mandarin Presentation Speech. The summarization process can be considered as a binary classification process. The collectives of different term weighting methods can provide better summarization performance than each of them with the same classification algorithm. Several different unsupervised and supervised term weighting methods and their collectives were evaluated with summarizer based on support vector machine (SVM) classifier. The majority vote strategy is used for handling the collectives. We show that the best result is provided with the vote of the collective of all term weighting methods. We also show that Term Relevance Ratio (TRR) gives more contribution for presentation speech summarization than other term weighting methods.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127741359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Japanese sentence compression using Simple English Wikipedia 使用简单英语维基百科的日语句子压缩
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451533
Shunsuke Takeno, Kazuhide Yamamoto
{"title":"Japanese sentence compression using Simple English Wikipedia","authors":"Shunsuke Takeno, Kazuhide Yamamoto","doi":"10.1109/IALP.2015.7451533","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451533","url":null,"abstract":"We describe a cross-lingual approach for sentence compression of articles of Japanese Wikipedia using the correspondence of articles of Simple English Wikipedia. Taking advantages of the nature of the corpus, we can find essential parts from encyclopedic description without highly depending on the statistical information which are noisy. We manually explored the correspondences between the articles of Japanese Wikipedia and those of Simple English Wikipedia and then proposed a cross-lingual alignment method using simple matching algorithm. We provide an analysis of the abovementioned correspondence and the preliminary result of sentence compression using Simple English Wikipedia.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115236853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rumor diffusion purpose analysis from social attribute to social content 从社会属性到社会内容的谣言传播目的分析
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-10-01 DOI: 10.1109/IALP.2015.7451543
Dazhen Lin, Yanping Lv, Donglin Cao
{"title":"Rumor diffusion purpose analysis from social attribute to social content","authors":"Dazhen Lin, Yanping Lv, Donglin Cao","doi":"10.1109/IALP.2015.7451543","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451543","url":null,"abstract":"Rumor is one of the important issues for social media. Previous works mainly focus on using social attribute features in rumor analysis. However, social attribute features don't indicate the purpose of a rumor which is one of the most important aspects of a rumor. To solve that problem, we focus on not only those social attribute features, but also social content features to find out what kind of features are useful for exploring the purpose of a rumor. Finally, we propose 6 kinds of features, where four of them belong to social attribute features and two of them belong to social content features. To uncover the purpose of rumors from proposed features, we choose Sina weibo, the biggest micro-blog platform in China, and crawl 11,676 rumors for analysis. The analysis results show that the diffusion purpose of rumors can be concluded from social content attributes, and proposed two layers KL divergence approach is useful in diffusion purpose words perception.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"105 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114048755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Topic2Vec: Learning distributed representations of topics Topic2Vec:学习主题的分布式表示
2015 International Conference on Asian Language Processing (IALP) Pub Date : 2015-06-28 DOI: 10.1109/IALP.2015.7451564
Liqiang Niu, Xinyu Dai, Jianbing Zhang, Jiajun Chen
{"title":"Topic2Vec: Learning distributed representations of topics","authors":"Liqiang Niu, Xinyu Dai, Jianbing Zhang, Jiajun Chen","doi":"10.1109/IALP.2015.7451564","DOIUrl":"https://doi.org/10.1109/IALP.2015.7451564","url":null,"abstract":"Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical relationship of occurrences in the corpus and usually in practice, probability is not the best choice for feature representations. Recently, embedding methods have been proposed to represent words and documents by learning essential concepts and representations, such as Word2Vec and Doc2Vec. The embedded representations have shown more effectiveness than LDA-style representations in many tasks. In this paper, we propose the Topic2Vec approach which can learn topic representations in the same semantic vector space with words, as an alternative to probability distribution. The experimental results show that Topic2Vec achieves interesting and meaningful results.","PeriodicalId":256927,"journal":{"name":"2015 International Conference on Asian Language Processing (IALP)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131994193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信