{"title":"从客户反馈语料库中挖掘观点方面的主题建模","authors":"O. I. Babina","doi":"10.3103/S0005105524010060","DOIUrl":null,"url":null,"abstract":"<p>The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.</p>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus\",\"authors\":\"O. I. Babina\",\"doi\":\"10.3103/S0005105524010060\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.</p>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2024-04-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105524010060\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105524010060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Topic Modeling for Mining Opinion Aspects from a Customer Feedback Corpus
The paper introduces a methodology for extracting opinion aspects from textual content by identifying the customer-evaluated parameters regarding a given object. These parameters form the foundation for shaping the customer’s attitudes toward the product or service. The proposed approach leverages topic modeling tools to delineate classes of vocabulary exhibiting semantics aligned with the parameters influencing the customer’s opinion about the object. Our study specifically explores the application of the BERTopic model as a topic modeling tool to address this challenge. The outlined methodology encompasses several sequential steps, including the preprocessing of textual data involving the removal of stopwords, conversion to lowercase characters, and lemmatization. Additionally, special consideration is given to the distinct lexical manifestations of opinion aspects, obtained as a result of the extraction of nominal, verbal, and adjectival single- and multicomponent phrases from the corpus. Subsequently, the corpus sentences are represented as vectors in a feature space expressed by the extracted words and phrases. The final step involves the application of topic modeling using the BERTopic model on the customer review corpus, utilizing the vector representations of corpus sentences. The experimental inquiry is conducted on a domain-specific Russian-language corpus comprising customer feedback on airline services gathered from customer review websites. The resultant topic distribution is then juxtaposed against a manually constructed conceptual model of the domain. The comparative analysis reveals that the automatic topic distribution aligns with the conceptual structure of the domain, demonstrating a precision of 0.955 and a recall of 0.875. These findings affirm the efficacy of employing the BERTopic model to address the problem of the corpus-based mining of opinion aspects.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.