Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval最新文献

筛选
英文 中文
Feature Extraction Technique Based on Conv1D and Conv2D Network for Thai Speech Emotion Recognition 基于Conv1D和Conv2D网络的泰语语音情感识别特征提取技术
Naris Prombut, S. Waijanya, Nuttachot Promrit
{"title":"Feature Extraction Technique Based on Conv1D and Conv2D Network for Thai Speech Emotion Recognition","authors":"Naris Prombut, S. Waijanya, Nuttachot Promrit","doi":"10.1145/3508230.3508238","DOIUrl":"https://doi.org/10.1145/3508230.3508238","url":null,"abstract":"Speech Emotion Recognition is one of the challenges in Natural Language Processing (NLP) area. There are many factors used to identify emotions in speech, such as pitch, intensity, frequency, duration, and speakers' nationality. This paper implements a speech emotion recognition model specifically for Thai language by classifying it into 5 emotions: Angry, Frustrated, Neutral, Sad, and Happy. This research uses a dataset from VISTEC-depa AI Research Institute of Thailand. There are 21,562 sounds (scripts) divided into 70% of training data and 30% of test data. We use the Mel spectrogram and Mel-frequency Cepstral Coefficients (MFCC) technique for feature extraction and 1D Convolutional Neural Network (Conv1D) all together with 2D Convolutional Neural Network (Conv2D), to classify emotions. With respect to the result, MFCC with Conv2D provides the highest accuracy at 80.59%, and is higher than the baseline study, which is of 71.35%.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128565889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Automated Intention Mining with Comparatively Fine-tuning BERT 基于相对微调BERT的自动化意图挖掘
Xuan Sun, Luqun Li, F. Mercaldo, Yichen Yang, A. Santone, F. Martinelli
{"title":"Automated Intention Mining with Comparatively Fine-tuning BERT","authors":"Xuan Sun, Luqun Li, F. Mercaldo, Yichen Yang, A. Santone, F. Martinelli","doi":"10.1145/3508230.3508254","DOIUrl":"https://doi.org/10.1145/3508230.3508254","url":null,"abstract":"In the field of software engineering, intention mining is an interesting but challenging task, where the goal is to have a good understanding of user generated texts so as to capture their requirements that are useful for software maintenance and evolution. Recently, BERT and its variants have achieved state-of-the-art performance among various natural language processing tasks such as machine translation, machine reading comprehension and natural language inference. However, few studies try to investigate the efficacy of pre-trained language models in the task. In this paper, we present a new baseline with fine-tuned BERT model. Our method achieves state-of-the-art results on three benchmark data sets, outscoring baselines by a substantial margin. We also further investigate the efficacy of the pre-trained BERT model with shallower network depths through a simple strategy for layer selection.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134325488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CBCP: A Method of Causality Extraction from Unstructured Financial Text CBCP:一种非结构化金融文本的因果关系提取方法
Lang Cao, Shihuangzhai Zhang, Juxing Chen
{"title":"CBCP: A Method of Causality Extraction from Unstructured Financial Text","authors":"Lang Cao, Shihuangzhai Zhang, Juxing Chen","doi":"10.1145/3508230.3508250","DOIUrl":"https://doi.org/10.1145/3508230.3508250","url":null,"abstract":"Extracting causality information from unstructured natural language text is a challenging problem in natural language processing. However, there are no mature special causality extraction systems. Most people use basic sequence labeling methods, such as BERT-CRF model, to extract causal elements from unstructured text and the results are usually not well. At the same time, there is a large number of causal event relations in the field of finance. If we can extract enormous financial causality, this information will help us better understand the relationships between financial events and build related event evolutionary graphs in the future. In this paper, we propose a causality extraction method for this question, named CBCP (Center word-based BERT-CRF with Pattern extraction), which can directly extract cause elements and effect elements from unstructured text. Compared to BERT-CRF model, our model incorporates the information of center words as prior conditions and performs better in the performance of entity extraction. Moreover, our method combined with pattern can further improve the effect of extracting causality. Then we evaluate our method and compare it to the basic sequence labeling method. We prove that our method performs better than other basic extraction methods on causality extraction tasks in the finance field. At last, we summarize our work and prospect some future work.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128896098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improved Bi-GRU Model for Imbalanced English Toxic Comments Dataset 不平衡英语有毒评论数据集的改进Bi-GRU模型
Zhongguo Wang, Bao Zhang
{"title":"Improved Bi-GRU Model for Imbalanced English Toxic Comments Dataset","authors":"Zhongguo Wang, Bao Zhang","doi":"10.1145/3508230.3508234","DOIUrl":"https://doi.org/10.1145/3508230.3508234","url":null,"abstract":"Deep learning is widely used in the study of English toxic comment classification. However, most existing studies failed to consider data imbalance. Aiming at an imbalanced English Toxic Comments Dataset, we propose an improved Bi-gated recurrent unit (GRU) model that combines an oversampling and cost-sensitive method. We use random oversampling in the improved model to reduce the data imbalance, introduce a cost-sensitive method, and propose a new loss function for the Bi-GRU model. Experimental results show that the improved Bi-GRU model demonstrates a significantly improved classification performance in the imbalanced English Toxic Comments Dataset.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"142 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129298806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners 中国EFL/ESL学习者的记分和纠错论文数据集
Kai Jin, Wuying Liu
{"title":"Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners","authors":"Kai Jin, Wuying Liu","doi":"10.1145/3508230.3508245","DOIUrl":"https://doi.org/10.1145/3508230.3508245","url":null,"abstract":"A certain scale of finely annotated essay dataset of EFL/ESL (English as a foreign language or the second language) learners is not only an important language resource for language research and teaching, but also contributing materials for language-related computing science. Unfortunately, this type of data open on the Internet are not only of small quantity but also of uneven quality, especially such data of Chinese learners. We collected 147 essays of Chinese EFL/ESL learners and had four teachers score them under the same criteria and one teacher annotate major errors, and have them scored in Pigai scoring system. We then structured the score file, error-annotated files, essay files together with context information, and built the Scored and Error-annotated Essay Dataset of Chinese EFL/ESL Learners (SeedCel) which is open on the Internet and will be incrementally updated. This paper explains how SeedCel is constructed, what the details of SeedCel are, and where SeedCel will be used.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128600485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic Segmentation for Interview Dialogue System 访谈对话系统的话题分割
Taiga Kirihara, Kazuyuki Matsumoto, M. Sasayama, Minoru Yoshida, K. Kita
{"title":"Topic Segmentation for Interview Dialogue System","authors":"Taiga Kirihara, Kazuyuki Matsumoto, M. Sasayama, Minoru Yoshida, K. Kita","doi":"10.1145/3508230.3508237","DOIUrl":"https://doi.org/10.1145/3508230.3508237","url":null,"abstract":"In this study, topic segmentation was performed by referring to the interview dialogue corpus. Utterance intention tags were added to the existing interview dialogue corpus, and uttered sentences were vectorized using BERT, Sentence BERT, and Distil BERT. In addition, topic classification was performed using the utterance intention tags and the features of the preceding and following uttered sentences. Consequently, the greatest accuracy was achieved when the utterance intention tag was used with DistilBERT.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Research on judgment reasoning using natural language inference in Chinese medical texts 基于自然语言推理的中医文本判断推理研究
Xin Li, Wenping Kong
{"title":"Research on judgment reasoning using natural language inference in Chinese medical texts","authors":"Xin Li, Wenping Kong","doi":"10.1145/3508230.3508248","DOIUrl":"https://doi.org/10.1145/3508230.3508248","url":null,"abstract":"Machine reading comprehension (MRC) is a task used to test the degree to which a machine understands natural language by asking the machine to answer questions according to a given context. Judgment reasoning is one of MRC tasks which means that given a context and questions, let machine gives the true and false answers, for some real-world data, there will be another option of unknown. Considering the current research status, this paper uses natural language inference (NLI) models to further study this judgment reasoning task, which is mainly to judge the semantic relationship between two sentences. In our paper, we first explain how the NLI task can be used to train universal sentence encoding models in the judgment reasoning process and subsequently describe the architectures used in NLI task, which covers a suitable range of sentence encoders currently in use and take the bi-directional long short-term memory (BI-LSTM) model with max-pooling over the hidden representations as an example explained in this paper. After some comparative experiments, we have verified that our NLI models are effective strategies to improve the performance of judgment reasoning in Chinese medical texts, which can effectively improve the accuracy values.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124430715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low-Resource NMT: A Case Study on the Written and Spoken Languages in Hong Kong 低资源的新语言机器学习:以香港书面语和口语为例
Hei Yi Mak, Tan Lee
{"title":"Low-Resource NMT: A Case Study on the Written and Spoken Languages in Hong Kong","authors":"Hei Yi Mak, Tan Lee","doi":"10.1145/3508230.3508242","DOIUrl":"https://doi.org/10.1145/3508230.3508242","url":null,"abstract":"The majority of inhabitants in Hong Kong are able to read and write in standard Chinese but use Cantonese as the primary spoken language in daily life. Spoken Cantonese can be transcribed into Chinese characters, which constitute the so-called written Cantonese. Written Cantonese exhibits significant lexical and grammatical differences from standard written Chinese. The rise of written Cantonese is increasingly evident in the cyber world. The growing interaction between Mandarin speakers and Cantonese speakers is leading to a clear demand for automatic translation between Chinese and Cantonese. This paper describes a transformer-based neural machine translation (NMT) system for written-Chinese-to-written-Cantonese translation. Given that parallel text data of Chinese and Cantonese are extremely scarce, a major focus of this study is on the effort of preparing good amount of training data for NMT. In addition to collecting 28K parallel sentences from previous linguistic studies and scattered internet resources, we devise an effective approach to obtaining 72K parallel sentences by automatically extracting pairs of semantically similar sentences from parallel articles on Chinese Wikipedia and Cantonese Wikipedia. We show that leveraging highly similar sentence pairs mined from Wikipedia improves translation performance in all test sets. Our system outperforms Baidu Fanyi's Chinese-to-Cantonese translation on 6 out of 8 test sets in BLEU scores. Translation examples reveal that our system is able to capture important linguistic transformations between standard Chinese and spoken Cantonese.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129902136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Natural Language Processing Applied on Large Scale Data Extraction from Scientific Papers in Fuel Cells 自然语言处理在燃料电池科学论文大规模数据提取中的应用
Feifan Yang
{"title":"Natural Language Processing Applied on Large Scale Data Extraction from Scientific Papers in Fuel Cells","authors":"Feifan Yang","doi":"10.1145/3508230.3508256","DOIUrl":"https://doi.org/10.1145/3508230.3508256","url":null,"abstract":"Natural language processing (NLP) has a great potential to help scientists automatically extract information from large-scale text datasets. In this paper, we focus on the process of NLP — including text acquisition, text preprocessing, word embedding training, and named entity recognition — applied on 106,181 abstracts of fuel cell papers. Then we evaluate our trained model on its ability of analogy, use the model to analyze the research trend in fuel cell materials and predict new materials. To the best of our knowledge, it is the first time that NLP has been applied in the field of fuel cells. This data-driven technique is demonstrated to have the potential to promote the discoveries of new fuel cell materials.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134455202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Examination of the quality of Conceptnet relations for PubMed abstracts PubMed摘要的概念关系质量检验
Rajeswaran Viswanathan, S. Priya
{"title":"Examination of the quality of Conceptnet relations for PubMed abstracts","authors":"Rajeswaran Viswanathan, S. Priya","doi":"10.1145/3508230.3508243","DOIUrl":"https://doi.org/10.1145/3508230.3508243","url":null,"abstract":"Conceptnet is a crowd sourced knowledge graph used to find relationship between words and concepts. PubMed is the largest source of documents for the bio-medical domain. From the PubMed abstracts stop words are removed and remaining words are used as seed words. For these seed words “Nearest neighbor” words are identified as candidate words using 3 popular Word Vectors (WV) - Word2Vec, Glove and FastText. Similarity is calculated for these words for each strata of relationship. Bootstrap estimator in Random Effects Model (REM) is used to study this relationship using the similarity scores. Analysis shows that there is heterogeneity among the relationships independent of the WV used as base.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"29 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133670306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书