Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Semantic Preserving Siamese Autoencoder for Binary Quantization of Word Embeddings 用于词嵌入二进制量化的语义保持连体自编码器
Wouter Mostard, Lambert Schomaker, M. Wiering
{"title":"Semantic Preserving Siamese Autoencoder for Binary Quantization of Word Embeddings","authors":"Wouter Mostard, Lambert Schomaker, M. Wiering","doi":"10.1145/3508230.3508235","DOIUrl":"https://doi.org/10.1145/3508230.3508235","url":null,"abstract":"Word embeddings are used as building blocks for a wide range of natural language processing and information retrieval tasks. These embeddings are usually represented as continuous vectors, requiring significant memory capacity and computationally expensive similarity measures. In this study, we introduce a novel method for semantic hashing continuous vector representations into lower-dimensional Hamming space while explicitly preserving semantic information between words. This is achieved by introducing a Siamese autoencoder combined with a novel semantic preserving loss function. We show that our quantization model induces only a 4% loss of semantic information over continuous representations and outperforms the baseline models on several word similarity and sentence classification tasks. Finally, we show through cluster analysis that our method learns binary representations where individual bits hold interpretable semantic information. In conclusion, binary quantization of word embeddings significantly decreases time and space requirements while offering new possibilities through exploiting semantic information of individual bits in downstream information retrieval tasks.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Contrastive Study on Linguistic Features between HT and MT based on NLPIR-ICTCLAS: A Case Study of Philosophical Text 基于NLPIR-ICTCLAS的汉译与汉译语言特征对比研究——以哲学文本为例
Yumei Ge, Bin Xu
{"title":"A Contrastive Study on Linguistic Features between HT and MT based on NLPIR-ICTCLAS: A Case Study of Philosophical Text","authors":"Yumei Ge, Bin Xu","doi":"10.1145/3508230.3508240","DOIUrl":"https://doi.org/10.1145/3508230.3508240","url":null,"abstract":"This paper, with the aid of NLPIR-ICTCLAS, analyzes and compares original English texts and different translation versions of a philosophical text. A 1:6 English-Chinese translation corpus is applied to study the linguistic structural features of human translation (HT) and machine translation (MT). This study shows that the HT is characterized by more complicated language and complex sentences. At the same time, in the process of translation, compared with MT engines, human translator can intentionally avoid using too many functional words, and deliver grammatical structures and logical relations of sentences mainly by the meanings of words or clauses. The five MT versions share similarities in the use of notional words and functional words.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125433711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Sentiment Analysis based on BERT and Convolutional Neural Networks 基于BERT和卷积神经网络的文本情感分析
Ping Huang, Huijuan Zhu, Lei Zheng, Ying Wang
{"title":"Text Sentiment Analysis based on BERT and Convolutional Neural Networks","authors":"Ping Huang, Huijuan Zhu, Lei Zheng, Ying Wang","doi":"10.1145/3508230.3508231","DOIUrl":"https://doi.org/10.1145/3508230.3508231","url":null,"abstract":"The rapid development of the network has accelerated the speed of information circulation. Analyzing the emotional tendency contained in the network text is very helpful to tap the needs of users. However, most of the existing sentiment classification models rely on manually labeled text features, resulting in insufficient mining of deep semantic features hidden in the text, and it is difficult to improve the classification performance significantly. This paper presents a text sentiment classification model combining BERT and convolutional neural networks (CNN). The model uses BERT to complete the word embedding of the text, and then uses CNN to learn the deep semantic information about the text, so as mine the emotional tendency towards the text. Through verification on the large movie review dataset, BERT-CNN model can achieve an accuracy of 86.67%, which is significantly better than traditional classification method of textCNN. The results show that the method has good performance in this field.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128735770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Query Disambiguation to Enhance Biomedical Information Retrieval Based on Neural Networks 基于神经网络的查询消歧增强生物医学信息检索
Wided Selmi, Hager Kammoun, Ikram Amous
{"title":"Query Disambiguation to Enhance Biomedical Information Retrieval Based on Neural Networks","authors":"Wided Selmi, Hager Kammoun, Ikram Amous","doi":"10.1145/3508230.3508253","DOIUrl":"https://doi.org/10.1145/3508230.3508253","url":null,"abstract":"Information Retrieval Systems (IRS) use a query to find the relevant documents. Often the query term can have more than one sense; this is known as the ambiguity problem. This problem is a cause of poor performance in IRS. For this purpose, Word Sense Disambiguation (WSD) specifically deals with choosing the right sense of an ambiguous term, among a set of given candidate senses, according to its context (surrounding text). Obtaining all candidate senses is therefore a challenge for WSD. Word Sense Induction (WSI) is a task that automatically induces the different senses of a target word in different contexts. In this work, we propose a biomedical query disambiguation method. In this method, WSI use K-means algorithm to cluster the different contexts of ambiguous query term (MeSH descriptor) in order to induce the different senses. The different contexts are the sentences extracted from PubMed containing the target MeSH descriptor. To represent sentences as vectors, we propose to use a contextualized embeddings model “Biobert”. Our method is derived from the intuitive idea that the correct sense in the one having the high similarity among the candidate senses of an ambiguous term with its context. The conducted experiments on OHSUMED test collection yielded significant results.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125986145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning 使用机器学习的基于检索的端到端泰米尔语封闭域会话代理
Kumaran Kugathasan, Uthayasanker Thayasivam
{"title":"Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning","authors":"Kumaran Kugathasan, Uthayasanker Thayasivam","doi":"10.1145/3508230.3508251","DOIUrl":"https://doi.org/10.1145/3508230.3508251","url":null,"abstract":"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132988505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Method of Graphical User Interface Adaptation Using Reinforcement Learning and Automated Testing 使用强化学习和自动化测试的图形用户界面适应方法
Victor Fyodorov, A. Karsakov
{"title":"Method of Graphical User Interface Adaptation Using Reinforcement Learning and Automated Testing","authors":"Victor Fyodorov, A. Karsakov","doi":"10.1145/3508230.3508255","DOIUrl":"https://doi.org/10.1145/3508230.3508255","url":null,"abstract":"Abstract—Graphical user interface adaptation becomes an increasingly time-consuming and resource-intensive task due to modern programs complexity and a big variety of information output devices. In this paper we propose a method for adapting a graphical user interface based on a person's workflow using a specific implementation of the interface. This method makes it possible to adapt the interface to the peculiarities of the user's workflow through optimization in the navigation area between program windows.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"215 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121552373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annotation and Evaluation of Utterance Intention Tag for Interview Dialogue Corpus 访谈对话语料中话语意图标签的标注与评价
M. Sasayama, Kazuyuki Matsumoto
{"title":"Annotation and Evaluation of Utterance Intention Tag for Interview Dialogue Corpus","authors":"M. Sasayama, Kazuyuki Matsumoto","doi":"10.1145/3508230.3508236","DOIUrl":"https://doi.org/10.1145/3508230.3508236","url":null,"abstract":"In this paper, we proposed the utterance intention tags for an interview dialogue corpus. We constructed an interview dialogue corpus with the tags we designed. Three or five annotators annotated the tags to the interview dialogue corpus (the total number of 49999 utterances) consisting of 30 dialogues. We conducted an evaluation experiment using Fleiss's kappa value to evaluate the reliability of the proposed tags. When three annotators annotated 18 different tags to the corpus, we obtained the kappa value of 0.55.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122599515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
STIF: Semi-Supervised Taxonomy Induction using Term Embeddings and Clustering STIF:使用词嵌入和聚类的半监督分类归纳
Maryam Mousavi, Elena Steiner, S. Corman, Scott W. Ruston, Dylan Weber, H. Davulcu
{"title":"STIF: Semi-Supervised Taxonomy Induction using Term Embeddings and Clustering","authors":"Maryam Mousavi, Elena Steiner, S. Corman, Scott W. Ruston, Dylan Weber, H. Davulcu","doi":"10.1145/3508230.3508247","DOIUrl":"https://doi.org/10.1145/3508230.3508247","url":null,"abstract":"In this paper, we developed a semi-supervised taxonomy induction framework using term embedding and clustering methods for a blog corpus comprising 145,000 posts from 650 Ukraine-related blog domains dated between 2010-2020. We extracted 32,429 noun phrases (NPs) and proceeded to split these NPs into a pair of categories: General/Ambiguous phrases, which might appear under any topic vs. Topical/Non-Ambiguous phrases, which pertain to a topic’s specifics. We used term representation and clustering methods to partition the topical/non-ambiguous phrases into 90 groups using the Silhouette method. Next, a team of 10 communications scientists analyzed the NP clusters and inducted a two-level taxonomy alongside its codebook. Upon achieving intercoder reliability of 94%, coders proceeded to map all topical/non-ambiguous phrases into a gold-standard taxonomy. We evaluated a range of term representation and clustering methods using extrinsic and intrinsic measures. We determined that GloVe embeddings with K-Means achieved the highest performance (i.e. 74% purity) for this real-world dataset.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124883357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Named Entity Recognition using Knowledge Graph Embeddings and DistilBERT 基于知识图嵌入和蒸馏器的命名实体识别
Shreya R. Mehta, Mansi A. Radke, Sagar Sunkle
{"title":"Named Entity Recognition using Knowledge Graph Embeddings and DistilBERT","authors":"Shreya R. Mehta, Mansi A. Radke, Sagar Sunkle","doi":"10.1145/3508230.3508252","DOIUrl":"https://doi.org/10.1145/3508230.3508252","url":null,"abstract":"Named Entity Recognition (NER) is a Natural Language Processing (NLP) task of identifying entities from a natural language text and classifies them into categories like Person, Location, Organization etc. Pre-trained neural language models (PNLM) based on transformers are state-of-the-art in many NLP task including NER. Analysis of output of DistilBERT, a popular PNLM, reveals that mis-classifications occur when a non-entity word is at a place contextually suitable for an entity. The paper is based on the hypothesis that the performance of a PNLM can be improved by combining it with Knowledge Graph Embeddings (KGE). We show that fine-tuning of DistilBERT along with NumberBatch KGE gives performance improvement over various Open-domain as well as Biomedical-domain datasets.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132113682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study of Predicting the Sincerity of a Question Asked Using Machine Learning 使用机器学习预测问题诚意的研究
T. Nguyen, P. Meesad
{"title":"A Study of Predicting the Sincerity of a Question Asked Using Machine Learning","authors":"T. Nguyen, P. Meesad","doi":"10.1145/3508230.3508258","DOIUrl":"https://doi.org/10.1145/3508230.3508258","url":null,"abstract":"The growth of applications in both scientific socialism and naturalism causes it increasingly difficult to assess whether a question is sincere or not. It is mandatory for many marketing and financial companies. Many utilizations will be reconfigured beyond recognition, especially text and images, while others face potential extinction as a corollary of advances in technology and computer science in particular. Analyzing text and image data will be truly needed for understanding valuable insights. In this paper, we analyzed the Quora dataset obtained from Kaggle.com to filter insincere and spam content. We used different preprocessing algorithms and analysis models provided in PySpark. Besides, we analyzed the manner of users established in writing their posts via the proposed prediction models. Finally, we showed the most accurate algorithm of the selected algorithms for classifying questions on Quora. The Gradient Boosted Tree was the best model for questions on Quora with an accuracy was 79.5% and followed was Long-Short Term Memory (LSTM) reaching 78.0%. Compared to other methods, the same building in Scikit-Learn and machine learning GRU, BiLSTM, BiGRU, applying models in PySpark could get a better answer in classifying questions on Quora.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126270466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书