Extract Aspect-based Financial Opinion Using Natural Language Inference

Proceedings of the 2022 International Conference on E-business and Mobile Commerce Pub Date : 2022-05-13 DOI:10.1145/3543106.3543120

Raymond So, Chun Fai Carlin Chu, Cheuk Wing Jessie Lee

{"title":"Extract Aspect-based Financial Opinion Using Natural Language Inference","authors":"Raymond So, Chun Fai Carlin Chu, Cheuk Wing Jessie Lee","doi":"10.1145/3543106.3543120","DOIUrl":null,"url":null,"abstract":"The emergence of transformer-based pre-trained language models (PTLMs) has bought new and improved techniques to natural language processing (NLP). Traditional rule-based NLP, for instance, is known for its deficiency of creating context-aware representations of words and sentences. Natural language inference (NLI) addresses this deficiency by using PTLMs to create context-sensitive embedding for contextual reasoning. This paper outlines a system design that uses traditional rule-based NLP and deep learning to extract aspect-based financial opinion from financial commentaries written using colloquial Cantonese, a dialect of the Chinese language used in Hong Kong. We need to confront the issue that existing off-the-shelf PTLMs, such as BERT and Roberta, are not pre-trained to understand the language semantics of colloquial Cantonese, let alone the slang, jargon, and codeword that people in Hong Kong use to articulate opinions. As a result, we approached the opinion extraction problem differently from the mainstream approaches, which use model-based named entity recognition (NER) to detect and extract opinion aspects as named entities and named entity relations. Because there is no PTLM for our specific language and problem domain, we solve the opinion extraction problem using rule-based NLP and deep learning techniques. We report our experience of creating a lexicon and identifying candidate opinion aspects in the input text using rule-based NLP. We discuss how to improve BERT’s linguistic knowledge of colloquial Cantonese through a fine-tuning procedure. We illustrate how to prepare the input text for contextual reasoning and demonstrate how to use NLI to confirm candidate opinion aspects as extractable.","PeriodicalId":150494,"journal":{"name":"Proceedings of the 2022 International Conference on E-business and Mobile Commerce","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on E-business and Mobile Commerce","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543106.3543120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The emergence of transformer-based pre-trained language models (PTLMs) has bought new and improved techniques to natural language processing (NLP). Traditional rule-based NLP, for instance, is known for its deficiency of creating context-aware representations of words and sentences. Natural language inference (NLI) addresses this deficiency by using PTLMs to create context-sensitive embedding for contextual reasoning. This paper outlines a system design that uses traditional rule-based NLP and deep learning to extract aspect-based financial opinion from financial commentaries written using colloquial Cantonese, a dialect of the Chinese language used in Hong Kong. We need to confront the issue that existing off-the-shelf PTLMs, such as BERT and Roberta, are not pre-trained to understand the language semantics of colloquial Cantonese, let alone the slang, jargon, and codeword that people in Hong Kong use to articulate opinions. As a result, we approached the opinion extraction problem differently from the mainstream approaches, which use model-based named entity recognition (NER) to detect and extract opinion aspects as named entities and named entity relations. Because there is no PTLM for our specific language and problem domain, we solve the opinion extraction problem using rule-based NLP and deep learning techniques. We report our experience of creating a lexicon and identifying candidate opinion aspects in the input text using rule-based NLP. We discuss how to improve BERT’s linguistic knowledge of colloquial Cantonese through a fine-tuning procedure. We illustrate how to prepare the input text for contextual reasoning and demonstrate how to use NLI to confirm candidate opinion aspects as extractable.

查看原文本刊更多论文

使用自然语言推理提取基于方面的财务意见

基于变压器的预训练语言模型(ptlm)的出现为自然语言处理(NLP)带来了新的和改进的技术。例如，传统的基于规则的NLP在创建单词和句子的上下文感知表示方面存在缺陷。自然语言推理(NLI)通过使用ptlm为上下文推理创建上下文敏感的嵌入来解决这一缺陷。本文概述了一个系统设计，该系统使用传统的基于规则的NLP和深度学习，从使用粤语(香港使用的一种中文方言)撰写的金融评论中提取基于方面的金融意见。我们需要面对的问题是，现有现成的ptlm，如BERT和Roberta，并没有预先训练他们理解粤语口语的语言语义，更不用说香港人用来表达意见的俚语、行话和暗语了。因此，我们处理意见提取问题的方法与主流方法不同，主流方法使用基于模型的命名实体识别(NER)来检测和提取意见方面作为命名实体和命名实体关系。由于我们的特定语言和问题领域没有PTLM，因此我们使用基于规则的NLP和深度学习技术来解决意见提取问题。我们报告了使用基于规则的NLP在输入文本中创建词典和识别候选意见方面的经验。我们讨论了如何通过一个微调程序来提高BERT对粤语口语的语言知识。我们演示了如何为上下文推理准备输入文本，并演示了如何使用NLI来确认候选意见方面是可提取的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2022 International Conference on E-business and Mobile Commerce

自引率

0.00%

发文量