Raymond So, Chun Fai Carlin Chu, Cheuk Wing Jessie Lee
{"title":"Extract Aspect-based Financial Opinion Using Natural Language Inference","authors":"Raymond So, Chun Fai Carlin Chu, Cheuk Wing Jessie Lee","doi":"10.1145/3543106.3543120","DOIUrl":null,"url":null,"abstract":"The emergence of transformer-based pre-trained language models (PTLMs) has bought new and improved techniques to natural language processing (NLP). Traditional rule-based NLP, for instance, is known for its deficiency of creating context-aware representations of words and sentences. Natural language inference (NLI) addresses this deficiency by using PTLMs to create context-sensitive embedding for contextual reasoning. This paper outlines a system design that uses traditional rule-based NLP and deep learning to extract aspect-based financial opinion from financial commentaries written using colloquial Cantonese, a dialect of the Chinese language used in Hong Kong. We need to confront the issue that existing off-the-shelf PTLMs, such as BERT and Roberta, are not pre-trained to understand the language semantics of colloquial Cantonese, let alone the slang, jargon, and codeword that people in Hong Kong use to articulate opinions. As a result, we approached the opinion extraction problem differently from the mainstream approaches, which use model-based named entity recognition (NER) to detect and extract opinion aspects as named entities and named entity relations. Because there is no PTLM for our specific language and problem domain, we solve the opinion extraction problem using rule-based NLP and deep learning techniques. We report our experience of creating a lexicon and identifying candidate opinion aspects in the input text using rule-based NLP. We discuss how to improve BERT’s linguistic knowledge of colloquial Cantonese through a fine-tuning procedure. We illustrate how to prepare the input text for contextual reasoning and demonstrate how to use NLI to confirm candidate opinion aspects as extractable.","PeriodicalId":150494,"journal":{"name":"Proceedings of the 2022 International Conference on E-business and Mobile Commerce","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 International Conference on E-business and Mobile Commerce","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543106.3543120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
The emergence of transformer-based pre-trained language models (PTLMs) has bought new and improved techniques to natural language processing (NLP). Traditional rule-based NLP, for instance, is known for its deficiency of creating context-aware representations of words and sentences. Natural language inference (NLI) addresses this deficiency by using PTLMs to create context-sensitive embedding for contextual reasoning. This paper outlines a system design that uses traditional rule-based NLP and deep learning to extract aspect-based financial opinion from financial commentaries written using colloquial Cantonese, a dialect of the Chinese language used in Hong Kong. We need to confront the issue that existing off-the-shelf PTLMs, such as BERT and Roberta, are not pre-trained to understand the language semantics of colloquial Cantonese, let alone the slang, jargon, and codeword that people in Hong Kong use to articulate opinions. As a result, we approached the opinion extraction problem differently from the mainstream approaches, which use model-based named entity recognition (NER) to detect and extract opinion aspects as named entities and named entity relations. Because there is no PTLM for our specific language and problem domain, we solve the opinion extraction problem using rule-based NLP and deep learning techniques. We report our experience of creating a lexicon and identifying candidate opinion aspects in the input text using rule-based NLP. We discuss how to improve BERT’s linguistic knowledge of colloquial Cantonese through a fine-tuning procedure. We illustrate how to prepare the input text for contextual reasoning and demonstrate how to use NLI to confirm candidate opinion aspects as extractable.