Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval最新文献

Semantic Embeddings for Food Search Using Siamese Networks 使用暹罗网络进行食物搜索的语义嵌入

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443303

Rutvik Vijjali, Anurag Mishra, Srinivas Nagamalla, Jairaj Sathyanarayna

{"title":"Semantic Embeddings for Food Search Using Siamese Networks","authors":"Rutvik Vijjali, Anurag Mishra, Srinivas Nagamalla, Jairaj Sathyanarayna","doi":"10.1145/3443279.3443303","DOIUrl":"https://doi.org/10.1145/3443279.3443303","url":null,"abstract":"Efficient and effective search is a key driver of business in e-commerce. Functionally, most search systems consist of retrieval and ranking phases. While the use of methods like Learning to Rank (LTR) for (re)ranking has been studied widely, most retrieval systems in the industry are still predominantly based on variants of text matching. Because text matching cannot capture the semantic intent of the query, most out-of-vocabulary (OOV) queries are either not handled at all or poorly handled by matching to similarly-spelled entities. For niche e-commerce like food delivery apps operating on phonetically spelled, non-Western dish names, this problem is even more acute. Pre-trained word embedding models are of limited help because the majority of dish names are words that occur rarely or not at all in most openly available vocabularies. In this work, we present experiments and efficient Siamese network based models to learn dish embeddings from scratch. Compared to current baselines, we demonstrate that these models lead to a 3--5% improvement in Mean Reciprocal Rank (MRR) and Recall@k. We also quantify, using a combination of in-house Food Taxonomy and the Davies-Bouldin (DB) index, that the new embeddings capture semantic information with an improvement of up to 20% over baseline.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123284719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Age Inference on Twitter using SAGE and TF-IGM 使用SAGE和TF-IGM对Twitter进行年龄推断

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443300

J. Cornelisse, Reshmi Gopalakrishna Pillai

引用次数: 3

Research on Information Extraction of Municipal Solid Waste Crisis using BERT-LSTM-CRF 基于BERT-LSTM-CRF的城市生活垃圾危机信息提取研究

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443314

Tianyu Wan, Wenhui Wang, Hui Zhou

引用次数: 2

Impact of Statistical Language Model on Example Based Machine Translation System between Kazakh and Turkish Languages 统计语言模型对基于实例的哈萨克语与土耳其语机器翻译系统的影响

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443286

Gulshat Kessikbayeva, I. Çiçekli

引用次数: 0

Sentiment Analysis for Review Rating Prediction in a Travel Journal 基于情感分析的旅游杂志评论评分预测

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443282

Jovelyn C. Cuizon, Carlos Giovanni Agravante

{"title":"Sentiment Analysis for Review Rating Prediction in a Travel Journal","authors":"Jovelyn C. Cuizon, Carlos Giovanni Agravante","doi":"10.1145/3443279.3443282","DOIUrl":"https://doi.org/10.1145/3443279.3443282","url":null,"abstract":"This paper presents sentiment analysis to predict numerical rating of text reviews in a web-based travel journal application. The application allows users to record and provide text reviews on tourist spots visited. Text reviews undergo parts-of-speech (POS) tagging, rule-based phrase chunking and dependency parsing to extract opinion phrases in noun-adjective and noun-verb pairs from the original text. Each pair is further classified to one of the four categories: accommodation, food, entertainment and tourist attraction using the noun against a curated bag-of-words (BOW) to ensure that only relevant statements are included in the scoring. Word Sense Disambiguation is performed to correctly identify the word sense that matches the meaning of the sentence using WordNet. SentiWordNet, a lexical resource for sentiment analysis, was used to determine polarity score representing the emotional intensity of the review. The system predicted star rating was compared with the actual author rating in Google Maps and with human annotator ratings who are asked to label the text reviews. The predicted rating scored low mean absolute error (MAE) between the system and human rating which means that the rating predicted is closer to human interpretation of the text reviews. Overall rating prediction accuracy is 82%.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131021700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto 普什图语基于词典和基于机器学习的命名实体识别的比较研究

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443307

R. Momand, Shakirullah Waseeb, Ahmad Masood Latif Rai

{"title":"A Comparative Study of Dictionary-based and Machine Learning-based Named Entity Recognition in Pashto","authors":"R. Momand, Shakirullah Waseeb, Ahmad Masood Latif Rai","doi":"10.1145/3443279.3443307","DOIUrl":"https://doi.org/10.1145/3443279.3443307","url":null,"abstract":"Information Extraction (IE) is the process of extracting structured information from unstructured text using natural language processing (NLP). One important sub-task of IE is the extraction of names of persons, places, and organizations, called Named Entity Recognition (NER). NER plays an important role in many NLP applications such as Question Answering, Machine Translation, and Text Summarization. It has been widely studied for high-resource languages like English. However, no research has taken place in this regard for Pashto. We hypothesized that based on the research done for English and other languages in the area of NER a system can be developed for Pashto. We have developed two NER systems for detecting names of persons, places, and organizations in Pashto text. First, a dictionary-based NER that uses three dictionaries containing names of persons, locations, and organizations, respectively. Second, a learning-based approach that uses Hidden Markov Model (HMM) for the task. We have evaluated both systems on a dataset collected from sports news. Our evaluation showed F-Measure of 82% for HMM and 60% for dictionary-based NER. Our findings highlight that HMM outperforms dictionary based NER.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128937516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

German Speech Recognition System using DeepSpeech 使用DeepSpeech的德语语音识别系统

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443313

Jiahua Xu, Kaveen Matta, Shaiful Islam, A. Nürnberger

引用次数: 4

Building a Chatbot on a Closed Domain using RASA 使用RASA在封闭域上构建聊天机器人

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443308

Khang Nhut Lam, Nam Nhat Le, J. Kalita

引用次数: 11

Long-Short Term Memory (LSTM) Networks with Time Series and Spatio-Temporal Approaches Applied in Forecasting Earthquakes in the Philippines 基于时间序列和时空方法的长短期记忆网络在菲律宾地震预报中的应用

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443288

A. C. Fabregas, Patrick Arellano, Andrea Nicole D. Pinili

{"title":"Long-Short Term Memory (LSTM) Networks with Time Series and Spatio-Temporal Approaches Applied in Forecasting Earthquakes in the Philippines","authors":"A. C. Fabregas, Patrick Arellano, Andrea Nicole D. Pinili","doi":"10.1145/3443279.3443288","DOIUrl":"https://doi.org/10.1145/3443279.3443288","url":null,"abstract":"A series of large earthquakes has been observed in different places in the Philippines in the year of 2019. These earthquake events led to destruction of infrastructures, households, heritage sites, and even multiple number of human lives. Earthquakes are hard to predict or forecast, which is why it is considered as a big challenge in the field of seismology. In this work, Rule Based Algorithm was used to classify the regions based on the latitude and longitude values, while Long Short-Term Memory (LSTM) Networks was used to forecast the following variables: frequency, maximum magnitude, and average depth of earthquake events in a specific region in a given year. The developed system was able to produce satisfactory results in the classification of regions, as well as in forecasting the maximum magnitude of earthquake events. The obtained results showed an improved prediction for the maximum magnitude, by considering both time series and spatiotemporal analysis, compared to previous prediction studies.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130771768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Categorical Perception of Mandarin Tones Based on Acoustic Features by Japanese Learners 日语学习者基于声学特征的普通话声调范畴感知

Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval Pub Date : 2020-12-18 DOI: 10.1145/3443279.3443293

Hong Zhu, K. Yoshimoto

{"title":"Categorical Perception of Mandarin Tones Based on Acoustic Features by Japanese Learners","authors":"Hong Zhu, K. Yoshimoto","doi":"10.1145/3443279.3443293","DOIUrl":"https://doi.org/10.1145/3443279.3443293","url":null,"abstract":"Based on acoustic features of four Mandarin tones, this study investigated the perceptual pattern between Tone1 (T1) and Tone4 (T4), Tone2 (T2) and Tone3 (T3) which are considered difficult for Japanese learners and Chinese native speakers to distinguish. We compared the performance of Mandarin and Japanese Listeners on the perception of Mandarin tones in a classical categorical perception experiment that employed identification and discrimination tasks. Experiments on T1 and T4 were designed using the fundamental frequency (fo) of endpoint as the acoustic cue, while experiments on T2 and T3 were designed using continual sound stimuli, which gradually changed from T2 to T3 varying in the timing of turning point (inflection point of the tone), &Dgr;fo (pitch difference between onset and turning point) or both acoustic dimensions. The results showed that when endpoint pitch was taken as the acoustic parameter, categorical perception was found between T1 and T4 by both Chinese native speakers and Japanese learners. And when the timing of turning point and &Dgr;fo were both taken as the acoustic parameters, both advanced Chinese learners and beginners demonstrated quasi-categorical perception of T2 and T3 whereas timing of turning point was used as a sole parameter, only a categorical perception tendency is observed.","PeriodicalId":414366,"journal":{"name":"Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130641471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0