International Conference on Applications of Natural Language to Data Bases最新文献

筛选
英文 中文
Adversarial Capsule Networks for Romanian Satire Detection and Sentiment Analysis 对抗胶囊网络罗马尼亚讽刺检测和情感分析
International Conference on Applications of Natural Language to Data Bases Pub Date : 2023-06-13 DOI: 10.1007/978-3-031-35320-8_31
Sebastian-Vasile Echim, Ruazvan-Alexandru Smuadu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin-Claudiu Pop
{"title":"Adversarial Capsule Networks for Romanian Satire Detection and Sentiment Analysis","authors":"Sebastian-Vasile Echim, Ruazvan-Alexandru Smuadu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin-Claudiu Pop","doi":"10.1007/978-3-031-35320-8_31","DOIUrl":"https://doi.org/10.1007/978-3-031-35320-8_31","url":null,"abstract":"","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128739229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoBERTweet: A BERT Language Model for Romanian Tweets 罗马尼亚语推文的BERT语言模型
International Conference on Applications of Natural Language to Data Bases Pub Date : 2023-06-11 DOI: 10.48550/arXiv.2306.06598
Iulian-Marius Tuaiatu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin-Claudiu Pop
{"title":"RoBERTweet: A BERT Language Model for Romanian Tweets","authors":"Iulian-Marius Tuaiatu, Andrei-Marius Avram, Dumitru-Clementin Cercel, Florin-Claudiu Pop","doi":"10.48550/arXiv.2306.06598","DOIUrl":"https://doi.org/10.48550/arXiv.2306.06598","url":null,"abstract":"Developing natural language processing (NLP) systems for social media analysis remains an important topic in artificial intelligence research. This article introduces RoBERTweet, the first Transformer architecture trained on Romanian tweets. Our RoBERTweet comes in two versions, following the base and large architectures of BERT. The corpus used for pre-training the models represents a novelty for the Romanian NLP community and consists of all tweets collected from 2008 to 2022. Experiments show that RoBERTweet models outperform the previous general-domain Romanian and multilingual language models on three NLP tasks with tweet inputs: emotion detection, sexist language identification, and named entity recognition. We make our models and the newly created corpus of Romanian tweets freely available.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114494005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts lonexplain: Reddit帖子中的孤独感是精神障碍的结果
International Conference on Applications of Natural Language to Data Bases Pub Date : 2023-05-30 DOI: 10.48550/arXiv.2305.18736
Muskan Garg, Chandni Saxena, Debabrata Samanta, B. Dorr
{"title":"LonXplain: Lonesomeness as a Consequence of Mental Disturbance in Reddit Posts","authors":"Muskan Garg, Chandni Saxena, Debabrata Samanta, B. Dorr","doi":"10.48550/arXiv.2305.18736","DOIUrl":"https://doi.org/10.48550/arXiv.2305.18736","url":null,"abstract":"Social media is a potential source of information that infers latent mental states through Natural Language Processing (NLP). While narrating real-life experiences, social media users convey their feeling of loneliness or isolated lifestyle, impacting their mental well-being. Existing literature on psychological theories points to loneliness as the major consequence of interpersonal risk factors, propounding the need to investigate loneliness as a major aspect of mental disturbance. We formulate lonesomeness detection in social media posts as an explainable binary classification problem, discovering the users at-risk, suggesting the need of resilience for early control. To the best of our knowledge, there is no existing explainable dataset, i.e., one with human-readable, annotated text spans, to facilitate further research and development in loneliness detection causing mental disturbance. In this work, three experts: a senior clinical psychologist, a rehabilitation counselor, and a social NLP researcher define annotation schemes and perplexity guidelines to mark the presence or absence of lonesomeness, along with the marking of text-spans in original posts as explanation, in 3,521 Reddit posts. We expect the public release of our dataset, LonXplain, and traditional classifiers as baselines via GitHub.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126828461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Few-shot Approach to Resume Information Extraction via Prompts 通过提示提取简历信息的几种方法
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-09-20 DOI: 10.1007/978-3-031-35320-8_32
Chengguang Gan, Tatsunori Mori
{"title":"A Few-shot Approach to Resume Information Extraction via Prompts","authors":"Chengguang Gan, Tatsunori Mori","doi":"10.1007/978-3-031-35320-8_32","DOIUrl":"https://doi.org/10.1007/978-3-031-35320-8_32","url":null,"abstract":"","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122789528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Zero and Few-shot Learning for Author Profiling 作者分析的零和少射学习
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-22 DOI: 10.48550/arXiv.2204.10543
Mara Chinea-Rios, Thomas Müller, Gretel Liz De la Pena Sarrac'en, Francisco Rangel, Marc Franco-Salvador
{"title":"Zero and Few-shot Learning for Author Profiling","authors":"Mara Chinea-Rios, Thomas Müller, Gretel Liz De la Pena Sarrac'en, Francisco Rangel, Marc Franco-Salvador","doi":"10.48550/arXiv.2204.10543","DOIUrl":"https://doi.org/10.48550/arXiv.2204.10543","url":null,"abstract":"Author profiling classifies author characteristics by analyzing how language is shared among people. In this work, we study that task from a low-resource viewpoint: using little or no training data. We explore different zero and few-shot models based on entailment and evaluate our systems on several profiling tasks in Spanish and English. In addition, we study the effect of both the entailment hypothesis and the size of the few-shot training sample. We find that entailment-based models out-perform supervised text classifiers based on roberta-XLM and that we can reach 80% of the accuracy of previous approaches using less than 50% of the training data on average.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127937314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Metric Learning and Adaptive Boundary for Out-of-Domain Detection 域外检测的度量学习和自适应边界
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-22 DOI: 10.48550/arXiv.2204.10849
Petr Lorenc, Tommaso Gargiani, Jan Pichl, Jakub Konrád, Petro Marek, Ondrej Kobza, J. Sedivý
{"title":"Metric Learning and Adaptive Boundary for Out-of-Domain Detection","authors":"Petr Lorenc, Tommaso Gargiani, Jan Pichl, Jakub Konrád, Petro Marek, Ondrej Kobza, J. Sedivý","doi":"10.48550/arXiv.2204.10849","DOIUrl":"https://doi.org/10.48550/arXiv.2204.10849","url":null,"abstract":"Conversational agents are usually designed for closed-world environments. Unfortunately, users can behave unexpectedly. Based on the open-world environment, we often encounter the situation that the training and test data are sampled from different distributions. Then, data from different distributions are called out-of-domain (OOD). A robust conversational agent needs to react to these OOD utterances adequately. Thus, the importance of robust OOD detection is emphasized. Unfortunately, collecting OOD data is a challenging task. We have designed an OOD detection algorithm independent of OOD data that outperforms a wide range of current state-of-the-art algorithms on publicly available datasets. Our algorithm is based on a simple but efficient approach of combining metric learning with adaptive decision boundary. Furthermore, compared to other algorithms, we have found that our proposed algorithm has significantly improved OOD performance in a scenario with a lower number of classes while preserving the accuracy for in-domain (IND) classes.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125438984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting early signs of depression in the conversational domain: The role of transfer learning in low-resource scenarios 在会话领域发现抑郁的早期迹象:低资源情境下迁移学习的作用
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-22 DOI: 10.48550/arXiv.2204.10841
Petr Lorenc, Ana Sabina Uban, Paolo Rosso, Jan vSediv'y
{"title":"Detecting early signs of depression in the conversational domain: The role of transfer learning in low-resource scenarios","authors":"Petr Lorenc, Ana Sabina Uban, Paolo Rosso, Jan vSediv'y","doi":"10.48550/arXiv.2204.10841","DOIUrl":"https://doi.org/10.48550/arXiv.2204.10841","url":null,"abstract":". The high prevalence of depression in society has given rise to the need for new digital tools to assist in its early detection. To this end, existing research has mainly focused on detecting depression in the domain of social media, where there is a sufficient amount of data. How-ever, with the rise of conversational agents like Siri or Alexa, the conversational domain is becoming more critical. Unfortunately, there is a lack of data in the conversational domain. We perform a study focusing on domain adaptation from social media to the conversational domain. Our approach mainly exploits the linguistic information preserved in the vector representation of text. We describe transfer learning techniques to classify users who suffer from early signs of depression with high recall. We achieve state-of-the-art results on a commonly used conversational dataset, and we highlight how the method can easily be used in conversational agents. We publicly release all source code 5 .","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116621841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers 零概率分类器标签描述的无监督排序和聚合
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-20 DOI: 10.48550/arXiv.2204.09481
Angelo Basile, Marc Franco-Salvador, Paolo Rosso
{"title":"Unsupervised Ranking and Aggregation of Label Descriptions for Zero-Shot Classifiers","authors":"Angelo Basile, Marc Franco-Salvador, Paolo Rosso","doi":"10.48550/arXiv.2204.09481","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09481","url":null,"abstract":"Zero-shot text classifiers based on label descriptions embed an input text and a set of labels into the same space: measures such as cosine similarity can then be used to select the most similar label description to the input text as the predicted label. In a true zero-shot setup, designing good label descriptions is challenging because no development set is available. Inspired by the literature on Learning with Disagreements, we look at how probabilistic models of repeated rating analysis can be used for selecting the best label descriptions in an unsupervised fashion. We evaluate our method on a set of diverse datasets and tasks (sentiment, topic and stance). Furthermore, we show that multiple, noisy label descriptions can be aggregated to boost the performance.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121472472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Active Few-Shot Learning with FASL 使用FASL进行主动的少镜头学习
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-20 DOI: 10.48550/arXiv.2204.09347
Thomas Müller, Guillermo P'erez-Torr'o, Angelo Basile, Marc Franco-Salvador
{"title":"Active Few-Shot Learning with FASL","authors":"Thomas Müller, Guillermo P'erez-Torr'o, Angelo Basile, Marc Franco-Salvador","doi":"10.48550/arXiv.2204.09347","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09347","url":null,"abstract":"Recent advances in natural language processing (NLP) have led to strong text classification models for many tasks. However, still often thousands of examples are needed to train models with good quality. This makes it challenging to quickly develop and deploy new models for real world problems and business needs. Few-shot learning and active learning are two lines of research, aimed at tackling this problem. In this work, we combine both lines into FASL, a platform that allows training text classification models using an iterative and fast process. We investigate which active learning methods work best in our few-shot setup. Additionally, we develop a model to predict when to stop annotating. This is relevant as in a few-shot setup we do not have access to a large validation set.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133641098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Named Entity Recognition for Partially Annotated Datasets 部分注释数据集的命名实体识别
International Conference on Applications of Natural Language to Data Bases Pub Date : 2022-04-19 DOI: 10.48550/arXiv.2204.09081
Michael Strobl, Amine Trabelsi, Osmar R Zaiane
{"title":"Named Entity Recognition for Partially Annotated Datasets","authors":"Michael Strobl, Amine Trabelsi, Osmar R Zaiane","doi":"10.48550/arXiv.2204.09081","DOIUrl":"https://doi.org/10.48550/arXiv.2204.09081","url":null,"abstract":"The most common Named Entity Recognizers are usually sequence taggers trained on fully annotated corpora, i.e. the class of all words for all entities is known. Partially annotated corpora, i.e. some but not all entities of some types are annotated, are too noisy for training sequence taggers since the same entity may be annotated one time with its true type but not another time, misleading the tagger. Therefore, we are comparing three training strategies for partially annotated datasets and an approach to derive new datasets for new classes of entities from Wikipedia without time-consuming manual data annotation. In order to properly verify that our data acquisition and training approaches are plausible, we manually annotated test datasets for two new classes, namely food and drugs.","PeriodicalId":136374,"journal":{"name":"International Conference on Applications of Natural Language to Data Bases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115716063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信