Md. Sabbir Hossain, Nishat Nayla, Annajiat Alim Rasel
{"title":"Product Market Demand Analysis Using NLP in Banglish Text with Sentiment Analysis and Named Entity Recognition","authors":"Md. Sabbir Hossain, Nishat Nayla, Annajiat Alim Rasel","doi":"10.48550/arXiv.2204.01827","DOIUrl":null,"url":null,"abstract":"Product market demand analysis plays a significant role for originating business strategies due to its noticeable impact on the competitive business field. Furthermore, there are roughly 228 million native Bengali speakers, the majority of whom use Banglish text to interact with one another on social media. Consumers are buying and evaluating items on social media with Banglish text as social media emerges as an online marketplace for entrepreneurs. People use social media to find preferred smartphone brands and models by sharing their positive and bad experiences with them. For this reason, our goal is to gather Banglish text data and use sentiment analysis and named entity identification to assess Bangladeshi market demand for smartphones in order to determine the most popular smartphones by gender. We scraped product related data from social media with instant data scrapers and crawled data from Wikipedia and other sites for product information with python web scrapers. Using Python's Pandas and Seaborn libraries, the raw data is filtered using NLP methods. To train our datasets for named entity recognition, we utilized Spacey's custom NER model, Amazon Comprehend Custom NER. A tensorflow sequential model was deployed with parameter tweaking for sentiment analysis. Meanwhile, we used the Google Cloud Translation API to estimate the gender of the reviewers using the BanglaLinga library. In this article, we use natural language processing (NLP) approaches and several machine learning models to identify the most in-demand items and services in the Bangladeshi market. Our model has an accuracy of 87.99% in Spacy Custom Named Entity recognition, 95.51% in Amazon Comprehend Custom NER, and 87.02% in the Sequential model for demand analysis. After Spacy's study, we were able to manage 80% of mistakes related to misspelled words using a mix of Levenshtein distance and ratio algorithms.","PeriodicalId":305918,"journal":{"name":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 56th Annual Conference on Information Sciences and Systems (CISS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2204.01827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Product market demand analysis plays a significant role for originating business strategies due to its noticeable impact on the competitive business field. Furthermore, there are roughly 228 million native Bengali speakers, the majority of whom use Banglish text to interact with one another on social media. Consumers are buying and evaluating items on social media with Banglish text as social media emerges as an online marketplace for entrepreneurs. People use social media to find preferred smartphone brands and models by sharing their positive and bad experiences with them. For this reason, our goal is to gather Banglish text data and use sentiment analysis and named entity identification to assess Bangladeshi market demand for smartphones in order to determine the most popular smartphones by gender. We scraped product related data from social media with instant data scrapers and crawled data from Wikipedia and other sites for product information with python web scrapers. Using Python's Pandas and Seaborn libraries, the raw data is filtered using NLP methods. To train our datasets for named entity recognition, we utilized Spacey's custom NER model, Amazon Comprehend Custom NER. A tensorflow sequential model was deployed with parameter tweaking for sentiment analysis. Meanwhile, we used the Google Cloud Translation API to estimate the gender of the reviewers using the BanglaLinga library. In this article, we use natural language processing (NLP) approaches and several machine learning models to identify the most in-demand items and services in the Bangladeshi market. Our model has an accuracy of 87.99% in Spacy Custom Named Entity recognition, 95.51% in Amazon Comprehend Custom NER, and 87.02% in the Sequential model for demand analysis. After Spacy's study, we were able to manage 80% of mistakes related to misspelled words using a mix of Levenshtein distance and ratio algorithms.
产品市场需求分析在竞争激烈的商业领域具有显著的影响,对企业制定经营战略起着重要的作用。此外,大约有2.28亿人以孟加拉语为母语,其中大多数人在社交媒体上使用孟加拉语进行互动。随着社交媒体成为企业家的在线市场,消费者正在用孟加拉语在社交媒体上购买和评估商品。人们利用社交媒体,通过分享自己的正面和负面体验,找到自己喜欢的智能手机品牌和型号。因此,我们的目标是收集孟加拉语文本数据,并使用情感分析和命名实体识别来评估孟加拉国市场对智能手机的需求,以确定按性别划分的最受欢迎的智能手机。我们使用即时数据抓取器从社交媒体上抓取产品相关数据,并使用python web抓取器从维基百科和其他网站抓取数据以获取产品信息。使用Python的Pandas和Seaborn库,使用NLP方法过滤原始数据。为了训练我们的数据集进行命名实体识别,我们使用了Spacey的自定义NER模型,Amazon领悟自定义NER。采用参数调整的张sorflow序列模型进行情感分析。同时,我们使用谷歌云翻译API使用BanglaLinga库来估计审稿人的性别。在本文中,我们使用自然语言处理(NLP)方法和几种机器学习模型来识别孟加拉国市场中需求最大的商品和服务。我们的模型在space Custom Named Entity recognition中的准确率为87.99%,在Amazon understand Custom NER中的准确率为95.51%,在sequence model for demand analysis中的准确率为87.02%。在Spacy的研究之后,我们能够使用Levenshtein距离和比例算法的组合来处理80%与拼错单词相关的错误。