Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Catapa Resume Parser: End to End Indonesian Resume Extraction Catapa简历解析器:端到端印度尼西亚简历提取
Berty Chrismartin Lumban Tobing, Immanuel Rhesa Suhendra, Christian Halim
{"title":"Catapa Resume Parser: End to End Indonesian Resume Extraction","authors":"Berty Chrismartin Lumban Tobing, Immanuel Rhesa Suhendra, Christian Halim","doi":"10.1145/3342827.3342832","DOIUrl":"https://doi.org/10.1145/3342827.3342832","url":null,"abstract":"This paper proposes a method to solve the problem of extracting contents from a resume, especially for Indonesian resumes using segmentation method by header followed by models for each corresponding headers. An end to end resume extraction system is created using some heuristic rules and machine learning algorithms to solve the problem. On average, an accuracy of ~91.41% is achieved for personal information entities (name, email, phone, gender, date of birth, and religion), ~68.47% accuracy for job experiences entities (company, job title, start date, and end date), and ~80.85% accuracy for educations entities (institution, major, level, start date, end date, and GPA) out of 221 random resumes using the aforementioned method.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122577445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Hybrid Method for Vietnamese Text Normalization 越南文文本规范化的混合方法
Nguyen Thi Thu Trang, Dang Xuan Bach, N. X. Tung
{"title":"A Hybrid Method for Vietnamese Text Normalization","authors":"Nguyen Thi Thu Trang, Dang Xuan Bach, N. X. Tung","doi":"10.1145/3342827.3342851","DOIUrl":"https://doi.org/10.1145/3342827.3342851","url":null,"abstract":"This paper presents a hybrid method for normalizing written text often found on newspapers to its spoken form. To normalize raw text with a number of non-standard words (NSWs), a two-step model is proposed. The first step involves classifying NSWs into different categories using Random Forest. The latter one is to expand them, depending on their NSW types, into pronounceable syllables using a hybrid method. Most of numeric types can be expanded by well-defined rules while most of alphabetic ones must be expanded by a deep learning (i.e. sequence-to-sequence) model and a post adjustment. The experiment on a Vietnamese corpus with proposed NSW categories shows that the most ambiguous cases of the classification model are for abbreviation and read-as-sequence types, hence combined into one category for the latter expansion with more complex model and better context. The classification model gives an enhanced result of 99.20% with the category combination and the feature optimization. In the expansion, the sequence-to-sequence model shows a good result of 96.53% for abbreviations and 96.25% for loanwords with a post-adjustment for some completely wrong cases. This model can predict effectively the expansions of abbreviations in context.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129772251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features 不同类型N-gram特征的俄罗斯论坛帖子的作者归属
T. Litvinova, O. Litvinova, Polina Panicheva
{"title":"Authorship Attribution of Russian Forum Posts with Different Types of N-gram Features","authors":"T. Litvinova, O. Litvinova, Polina Panicheva","doi":"10.1145/3342827.3342834","DOIUrl":"https://doi.org/10.1145/3342827.3342834","url":null,"abstract":"Authorship attribution is an important field in online security. Recently there have been numerous successful works in authorship attribution in various European languages. Character n-grams are reported to be the best choice in authorship attribution, as they encode both style and content information. We evaluate different types of character n-gram features in an authorship attribution task in a real-world noisy dataset of Russian forum posts. We also supplement them with a number of new simple n-gram features capturing syntactic and discourse patterns. We perform authorship attribution in a single-topic and a cross-topic setting, as the research question is whether character n-grams capture both style and content information. Our results show that character n-grams are indeed very successful in Russian forum post authorship attribution. However, there is no clear distinction of style and content n-grams, as the same types of n-grams work well for both single-topic and cross-topic settings. In our experiments the generalized simple n-gram features which reveals syntactic and discourse patterns were proved to be also very important in authorship attribution of short informal Russian texts. They represent a different kind of authorship information and are a successful addition to the character n-grams in authorship attribution of forum texts in the Russian language.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121688805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Is It Possible to Use Chatbot for the Chinese Word Segmentation? 用聊天机器人进行汉语分词是否可行?
K. Chang, Hsien-Tsung Chang
{"title":"Is It Possible to Use Chatbot for the Chinese Word Segmentation?","authors":"K. Chang, Hsien-Tsung Chang","doi":"10.1145/3342827.3342836","DOIUrl":"https://doi.org/10.1145/3342827.3342836","url":null,"abstract":"A word is the smallest item in Natural Language Processing. However, there is no obvious boundary for Chinese words. How to segment Chinese words always obstructs Chinese researches and applications. Nowadays, a neural network model, Seq2Seq with LSTM, is well-known for translation or chatbot application. In this paper, we try to transform the Chinese word segmentation problem into a translation problem. And we utilized an open-source chatbot to simulate the translation task. In our experimental results, we can produce similar Chinese word segmentation results when we provide training data which is automatically generated from famous Chinese word segmentation services.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124444381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Vietnamese WordNet using word embedding 利用词嵌入改进越南语WordNet
Khang Nhut Lam, Tuan Huynh To, Thong Tri Tran, J. Kalita
{"title":"Improving Vietnamese WordNet using word embedding","authors":"Khang Nhut Lam, Tuan Huynh To, Thong Tri Tran, J. Kalita","doi":"10.1145/3342827.3342854","DOIUrl":"https://doi.org/10.1145/3342827.3342854","url":null,"abstract":"This paper presents a simple but effective method to improve the quality of WordNet synsets and extract glosses for synsets. We translate the Princeton WordNet and other intermediate WordNets to a target language using a machine translator, then the correct candidates are selected by applying different ranking methods: occurrence count, cosine similarity between words, cosine similarity between word embeddings and cosine similarity between Doc2Vec of sentences. Our approaches may be applicable to build WordNets in any language which has some bilingual dictionaries and at least a monolingual corpus in the target language.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114720256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Guideline for Academic Support of Student Career Path Using Mining Algorithm 基于挖掘算法的学生职业生涯路径学术支持指南
M. Sodanil, Saranlita Chotirat, L. Poomhiran, Kanchana Viriyapant
{"title":"Guideline for Academic Support of Student Career Path Using Mining Algorithm","authors":"M. Sodanil, Saranlita Chotirat, L. Poomhiran, Kanchana Viriyapant","doi":"10.1145/3342827.3342841","DOIUrl":"https://doi.org/10.1145/3342827.3342841","url":null,"abstract":"In general, higher education is an important step in preparing a career for students in the future. Graduates should have qualifications that are recognized by both entrepreneurs and society. Therefore, every higher educational institution should make an effort to consider how to assist students' performance. This research aims to analyze the relationships between courses that are likely to produce a future career for students using the Apriori algorithm. The data used in the operation of the association rule was the student's grades from 25 main courses in the field of information technology, Department of Information Technology, Faculty of Science and Technology, Suan Sunandha Rajabhat University. This data was recorded between 2011 and 2019 and stored in the registration and graduate career system. The 14 association rules were determined from the operation by using the Weka 3.8.3 data mining software, this indicated that there were a few courses in which students could have future careers. Most importantly, the results can contribute to guidelines for the academic support of students' future career.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"146-147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124044282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics 基于词频分布和篇章统计的母语和非母语英语作文分析
H. Tsubaki
{"title":"Analysis of Native and Non-native Speakers' English Compositions based on Word-frequency Distribution and Text Statistics","authors":"H. Tsubaki","doi":"10.1145/3342827.3342856","DOIUrl":"https://doi.org/10.1145/3342827.3342856","url":null,"abstract":"In this paper, word-frequency distribution of JACET 8000 basic words and text statistics were researched to compare and analyze differentials of English compositions (essays) written by native speakers and non-native speakers. As for the native speakers' essays, the Guiraud Index in each Level 2-8 to Average sentence length and Automated Readability Index had higher correlation coefficients. Meanwhile, on the non-native speakers' essays, the index values to Sentence count showed moderate correlation coefficients. It was observed that the productivity and readability of the compositions seem to depend on ranges of basic content words which native or non-native writers have acquired and can use in English. To verify the word-frequency distribution as proficiency rating measurement for non-native speakers, the estimation experiment was carried out based on a multiple-regression model using word-frequency distribution of 68 English compositions written by the non-native writers. The estimated scores of the learners showed a correlation score 0.475 to their actual TOEIC scores. These results confirmed the possibility of the word usage statistics for the objective evaluation of L2 (second language) learners' language proficiency.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127114961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Morphological Embeddings for the Russian Language 俄语形态嵌入的评价
V. Romanov, A. Khusainova
{"title":"Evaluation of Morphological Embeddings for the Russian Language","authors":"V. Romanov, A. Khusainova","doi":"10.1145/3342827.3342846","DOIUrl":"https://doi.org/10.1145/3342827.3342846","url":null,"abstract":"A number of morphology-based word embedding models were introduced in recent years. However, their evaluation was mostly limited to English, which is known to be a morphologically simple language. In this paper, we explore whether and to what extent incorporating morphology into word embeddings improves performance on downstream NLP tasks, in the case of morphologically rich Russian language. NLP tasks of our choice are POS tagging, Chunking, and NER -- for Russian language, all can be mostly solved using only morphology without understanding the semantics of words. Our experiments show that morphology-based embeddings trained with Skipgram objective do not outperform existing embedding model -- FastText. Moreover, a more complex, but morphology unaware model, BERT, allows to achieve significantly greater performance on the tasks that presumably require understanding of a word's morphology.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126070043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
HWE: Hybrid Word Embeddings For Text Classification 用于文本分类的混合词嵌入
Xuebo Song, P. Srimani, James Ze Wang
{"title":"HWE: Hybrid Word Embeddings For Text Classification","authors":"Xuebo Song, P. Srimani, James Ze Wang","doi":"10.1145/3342827.3342837","DOIUrl":"https://doi.org/10.1145/3342827.3342837","url":null,"abstract":"Text classification is one of the most important tasks in natural language processing and information retrieval due to the increasing availability of documents in digital form and the ensuing need to access them in flexible ways. By assigning documents to labeled classes, text classification can reduce the search space and expedite the process of retrieving relevant documents. In this paper, we propose a novel text representation method, Hybrid Word Embeddings (HWE), which combines semantic information obtained fromWord- Net and contextual information extracted from text documents to provide concise and accurate representations of text documents. The proposed HWE method can improve the efficiency of deriving word semantics from text by taking advantage of the semantic relationships extracted from WordNet with less training corpus. Experimental study on classification of documents shows that the proposed HWE outperforms existing methods, including Doc2Vec and Word2Vec, in terms of classification accuracy, recall, precision, etc.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116206202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Natural Language Understanding in Smartdialog: A Platform for Vietnamese Intelligent Interactions 智能对话中的自然语言理解:越南语智能交互平台
Nguyen Thi Thu Trang, Nguyen Hoang Ky, H. Sơn, N. T. Hung, Nguyễn Danh Huân
{"title":"Natural Language Understanding in Smartdialog: A Platform for Vietnamese Intelligent Interactions","authors":"Nguyen Thi Thu Trang, Nguyen Hoang Ky, H. Sơn, N. T. Hung, Nguyễn Danh Huân","doi":"10.1145/3342827.3342857","DOIUrl":"https://doi.org/10.1145/3342827.3342857","url":null,"abstract":"Nowadays in the modern world, interactive smart dialogs with text or voice are gaining traction as the main digital interaction channel between human and machine. However, most of the current platforms do not support or have not fully developed for Vietnamese. In this paper, the authors propose a smart conversational platform through a text channel and/or voice channel in Vietnamese language, including these main steps: (i) Input Conversion and Pre-Processing, (ii) Entity Recognition, (iii) Intent Classification, (iv) Action Prediction and Execution, and (v) Output Generation. This paper focuses on presenting problems related to natural language understanding. To recognize entities in a sentence, the authors studied and optimized the features for Vietnamese with the Conditional Random Field model. With the problem of predicting user intent, this work proposed, experimented, and compared of Random Forest and BiLSTM deep learning model to optimize for the Vietnamese language. A platform was built and deployed for Milo smart speaker application (LUMI smart home) and VADI driver virtual assistant with the accuracy of around 98.7%.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115209329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信