Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Topic Modeling on Indonesian Online Shop Chat 印尼语网上商店聊天的主题建模
A. Hidayatullah, Wisnu Kurniawan, Chanifah Indah Ratnasari
{"title":"Topic Modeling on Indonesian Online Shop Chat","authors":"A. Hidayatullah, Wisnu Kurniawan, Chanifah Indah Ratnasari","doi":"10.1145/3342827.3342831","DOIUrl":"https://doi.org/10.1145/3342827.3342831","url":null,"abstract":"This paper aims to discover topics from an Indonesian online shop chat. Moreover, we employed Latent Dirichlet Allocation to find out what kind of topics that are often discussed and conversation trends between buyers and customer service. Several tasks were performed, such as, collecting data, preprocessing, phrase aggregation, topic modeling, and topic analysis. We found several attracting findings during our experiments. In preprocessing task, product name extraction from URLs assisted to discover the intended product from the customer's conversation. On the other hand, the phrase aggregation task helped us to merge various terms which have same intended meaning, so that, we could obtain better topical model result and easier to determine the topic label.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation of Pseudo-Relevance Feedback using Wikipedia 使用维基百科评估伪相关反馈
Murtadha Aljubran
{"title":"Evaluation of Pseudo-Relevance Feedback using Wikipedia","authors":"Murtadha Aljubran","doi":"10.1145/3342827.3342845","DOIUrl":"https://doi.org/10.1145/3342827.3342845","url":null,"abstract":"Users have specific information needs which are expressed in short queries to information retrieval systems. The queries are unstructured, and they tend to be short and ambiguous in most cases. Using the shallow language statistics including probabilistic or language models such as BM25 or Indri respectively can enhance the retrieval system metrics like Mean Average Precision (MAP). However, such methods depend on query terms and their presence in the retrieved document to define relevance. Query expansion is a technique that can be used to overcome this problem by expanding the query with terms from an initial top few relevant documents. The question that we try to answer is whether the quality of the corpus used for expansion produce a significant improvement MAP and precision at top 30 retrieved documents. We show that the quality and the selection criteria of expansion documents are important factors in query expansion performance.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129201296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Speaker Embedding for Speaker-Targeted Automatic Speech Recognition 针对说话人自动语音识别的深度说话人嵌入
Guan-Lin Chao, John Paul Shen, Ian Lane
{"title":"Deep Speaker Embedding for Speaker-Targeted Automatic Speech Recognition","authors":"Guan-Lin Chao, John Paul Shen, Ian Lane","doi":"10.1145/3342827.3342847","DOIUrl":"https://doi.org/10.1145/3342827.3342847","url":null,"abstract":"In this work, we investigate three types of deep speaker embedding as text-independent features for speaker-targeted speech recognition in cocktail party environments. The text-independent speaker embedding is extracted from the target speaker's existing speech segment (i-vector and x-vector) or face image (f-vector), which is concatenated with acoustic features of any new speech utterances as input features. Since the proposed model extracts the speaker embedding of the target speaker once and for all, it is computationally more efficient than many prior approaches which estimate the target speaker's characteristics on the fly. Empirical evaluation shows that using speaker embedding along with acoustic features improves Word Error Rate over the audio-only model, from 65.7% to 29.5%. Among the three types of speaker embedding, x-vector and f-vector show robustness against environment variations while i-vector tends to overfit to the specific speaker and environment condition.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131659915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Task-oriented Chatbot Based on LSTM and Reinforcement Learning 基于LSTM和强化学习的面向任务的聊天机器人
Tai-Liang Chou, Yu-Ling Hsueh
{"title":"A Task-oriented Chatbot Based on LSTM and Reinforcement Learning","authors":"Tai-Liang Chou, Yu-Ling Hsueh","doi":"10.1145/3342827.3342844","DOIUrl":"https://doi.org/10.1145/3342827.3342844","url":null,"abstract":"Traditional conversational chatbots usually adopt a retrieved-based model. Developers have to provide a large amount of conversational data and classify those data to different intents. To avoid cumbersome development processes, we propose a method to build a chatbot by a sentence generation model which generates sequence sentences based on the generative adversarial network. The architecture of our model contains a generator that generates a diverse sentence, and a discriminator that judges the sentences between the generated and the raw data. In the generator, we combine the attention model that responses for tracking conversational states with the sequence-to-sequence model using hierarchical long-short term memory to extract sentence information. For the discriminator, we calculate twotypes of rewards to assign low rewards for repeated sentences and high rewards for diverse sentences. Extensive experiments are presented to demonstrate the utility of our model which generates more diverse and information-rich sentences than those of the existing approaches.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124649232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Building the Language Resource for a Cebuano-Filipino Neural Machine Translation System 基于神经机器翻译系统的语言资源构建
Kristine Mae M. Adlaon, N. Marcos
{"title":"Building the Language Resource for a Cebuano-Filipino Neural Machine Translation System","authors":"Kristine Mae M. Adlaon, N. Marcos","doi":"10.1145/3342827.3342833","DOIUrl":"https://doi.org/10.1145/3342827.3342833","url":null,"abstract":"Parallel corpus is a critical resource in machine learning based translation. The task of collecting, extracting, and aligning texts in order to build an acceptable corpus for doing translation is very tedious most especially for low-resource languages. In this paper, we present the efforts made to build a parallel corpus for Cebuano and Filipino from two different domains: biblical texts and the web. For the biblical resource, subword unit translation for verbs and copy-able approach for nouns were applied to correct inconsistencies in translation. This correction mechanism was applied as a preprocessing technique. On the other hand, for Wikipedia being the main web resource, commonly occurring topic segments were extracted from both the source and the target languages. These observed topic segments are unique in 4 different categories. The identification of these topic segments may be used for automatic extraction of sentences. A Recurrent Neural Network was used to implement the translation using OpenNMT sequence modeling tool in TensorFlow. The two different corpora were then evaluated by using them as two separate inputs in the neural network. Results have shown a difference in BLEU score in both corpora.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126710273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Text Compression for Myanmar Information Retrieval 缅甸信息检索的文本压缩
N. Lin, A. KudinovVitaly, Y. Soe
{"title":"Text Compression for Myanmar Information Retrieval","authors":"N. Lin, A. KudinovVitaly, Y. Soe","doi":"10.1145/3342827.3342830","DOIUrl":"https://doi.org/10.1145/3342827.3342830","url":null,"abstract":"Myanmar word segmentation is an important task for construction of dictionary file for Myanmar information retrieval and Myanmar text compression. Although Myanmar word segmentation using dictionary and orthography has been existed for Myanmar language, the performance of word segmentation depends on the coverage of the dictionary and training dataset and can cause out of vocabulary (OOV) problem, leading to lower precision and recall in information retrieval. And to compress Myanmar text, words in text needs to be recognized first. In this paper, we propose a new method for Myanmar word segmentation by local statistical dataset without the use of any additional data (e.g., training corpus) and new compressed Myanmar Information Retrieval (MIR) model which used End Tagged Dense Code (ETDC) text compressed method. The experimental results showed that the method can improve evaluation of vocabulary file with precision 75%, recall 87%, F-measure 80% and average compression ratio is 32% of texts for Myanmar language.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125993827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Applicability of Text-representing Centroids for Thai Language Documents 文本表示质心在泰语文档中的适用性
Sureeporn Nualnim, Nirach Romyen, M. Sodanil
{"title":"Applicability of Text-representing Centroids for Thai Language Documents","authors":"Sureeporn Nualnim, Nirach Romyen, M. Sodanil","doi":"10.1145/3342827.3342853","DOIUrl":"https://doi.org/10.1145/3342827.3342853","url":null,"abstract":"Text-representing centroids are investigated method recently used to categorize and compare documents written in European languages. As it will be shown, Asian languages and in particular Thai exhibit completely other language structures. Nevertheless, a strong justification will be given that the methodology of the text-representing centroids can be successfully applied to Thai documents, too. For the experiments, a corpus which contained 100 randomly selected articles from an offline Thai Wikipedia was used. The obtained centroids well reflect the topic of those documents as in the original publication. In addition, the centroids are quite suitable to compare any two files.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130076598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text Classification of Network Pyramid Scheme based on Topic Model 基于主题模型的网络传销文本分类
Pengyu Mu, Jingsha He, Nafei Zhu
{"title":"Text Classification of Network Pyramid Scheme based on Topic Model","authors":"Pengyu Mu, Jingsha He, Nafei Zhu","doi":"10.1145/3342827.3342835","DOIUrl":"https://doi.org/10.1145/3342827.3342835","url":null,"abstract":"At present, the network pyramid scheme has become a major tumor that hinders social development. In order to curb the propagation of the network pyramid scheme and effectively identify the pyramid scheme text in the network, this study proposes a joint topic model, Paragraph Vector Latent Dirichlet Allocation (PV_LDA), based on the characteristics of high-yield, high rebate, hierarchical salary and text topic diversity described in the text. The model uses the paragraph as the minimum processing unit to generate the topic distribution matrix of \"high-interest rate\" and \"hierarchical salary\" from the network pyramid scheme text. The Gibbs sampling is used to derive the \"pyramid scheme\" topic distribution matrix represented by the two features, which is used for classification processing by the classifier. the classification accuracy rate for the network pyramid scheme text can reach 86.25%. The conclusions show that the topic model proposed in this paper can capture the characteristics of the pyramid scheme more reasonably.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131029558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Sentiment Analysis for Comparing Attitudes between Computer Professionals and Laypersons on the Topic of Artificial Intelligence 用情感分析比较计算机专业人员和非专业人员对人工智能话题的态度
Xueying Wang
{"title":"Using Sentiment Analysis for Comparing Attitudes between Computer Professionals and Laypersons on the Topic of Artificial Intelligence","authors":"Xueying Wang","doi":"10.1145/3342827.3342829","DOIUrl":"https://doi.org/10.1145/3342827.3342829","url":null,"abstract":"Most research in investigating computer professionals and laypersons' attitudes toward artificial intelligence (AI) are limited to online or offline surveys. This paper analyzes computer professionals' and laypersons' attitudes toward AI by using a sentiment lexicon developed by Wilson et al. To explore whether there is a correlation between the occupation categories (computer-related versus non-computer-related occupations) and people's attitudes toward artificial intelligence, I conducted a polarity classification of over 0.6 million tweets containing references to \"AI\", \"artificial intelligence\", or both. The result did not provide evidence of a relationship between public attitudes toward AI and the occupation categories. In the end, several future directions in the data collection and the data analysis are discussed.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117242519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Novel Task-Oriented Text Corpus in Silent Speech Recognition and its Natural Language Generation Construction Method 一种面向任务的无声语音识别文本语料库及其自然语言生成构建方法
Dong Cao, Dongdong Zhang, Haibo Chen
{"title":"A Novel Task-Oriented Text Corpus in Silent Speech Recognition and its Natural Language Generation Construction Method","authors":"Dong Cao, Dongdong Zhang, Haibo Chen","doi":"10.1145/3342827.3342838","DOIUrl":"https://doi.org/10.1145/3342827.3342838","url":null,"abstract":"Millions of people with severe speech disorders around the world may regain their communication capabilities through techniques of silent speech recognition (SSR). Using electroencephalography (EEG) as a biomarker for speech decoding has been popular for SSR. However, the lack of SSR text corpus has impeded the development of this technique. Here, we construct a novel task-oriented text corpus, which is utilized in the field of SSR. In the process of construction, we propose a task-oriented hybrid construction method based on natural language generation (NLG) algorithm. The algorithm focuses on the strategy of data-to-text generation, and has two advantages including linguistic quality and high diversity. These two advantages use template-based method and deep neural networks respectively. In an SSR experiment with the generated text corpus, analysis results show that the performance of our hybrid construction method outperforms the pure method such as template-based natural language generation or neural natural language generation models.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133856133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信