{"title":"NLP-based smart decision making for business and academics","authors":"Pradnya Sawant, Kavita Sonawane","doi":"10.1016/j.nlp.2024.100090","DOIUrl":"10.1016/j.nlp.2024.100090","url":null,"abstract":"<div><p>Natural Language Processing (NLP) systems enable machines to understand, interpret, and generate human-like language, bridging the gap between human communication and computer understanding. Natural Language Interface to Databases (NLIDB) and Natural Language Interface to Visualization (NLIV) systems are designed to enable non-technical users to retrieve and visualize data through natural language queries. However, these systems often face challenges in handling complex correlation and analytical questions, limiting their effectiveness for comprehensive data analysis. Additionally, current Business Intelligence (BI) tools also struggle with understanding the context and semantics of complex questions, further hindering their usability for strategic decision-making. Also, when building these models for generating the queries from natural language, the system handles only the semantic parsing issues as each column header is being changed manually to their normal names by all existing models which is time-consuming, tedious, and subjective.</p><p>Recent studies reflect the need for attention to context, semantics, and especially ambiguities in dealing with natural language questions. To address this problem, the proposed architecture focuses on understanding the context, correlation-based semantic analysis, and removal of ambiguities using a novel approach. An Enhanced Longest Common Subsequence (ELCS) is suggested where existing LCS is modified with a memorization component for mapping the natural language question tokens with ambiguous table column headers. This can speed up the overall process as human intervention is not required to manually change the column headers. The same is evidenced by carrying out thorough experimentation and comparative study in terms of precision, recall, and F1 score. By synthesizing the latest advancements and addressing challenges, this paper has proved how NLP can significantly enhance the accuracy and efficiency of information retrieval and visualization, broadening the inclusivity and usability of NLIDB, NLIV, and BI systems.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100090"},"PeriodicalIF":0.0,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000384/pdfft?md5=fd2e14b2d3243c083595a7e1f7015f23&pid=1-s2.0-S2949719124000384-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141842666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring transformer models in the sentiment analysis task for the under-resource Bengali language","authors":"Md. Nesarul Hoque , Umme Salma , Md. Jamal Uddin , Md. Martuza Ahamad , Sakifa Aktar","doi":"10.1016/j.nlp.2024.100091","DOIUrl":"10.1016/j.nlp.2024.100091","url":null,"abstract":"<div><p>In the sentiment analysis (SA) task, we can obtain a positive or negative-typed comment or feedback from an online user or a customer about any object, such as a movie, drama, food, and others. This user’s sentiment may positively impact various decision-making processes. In this regard, a lot of studies have been done on identifying sentiments from a text in high-resource languages like English. However, a small number of studies are detected in the under-resource Bengali language because of the unavailability of the benchmark corpus, limitations of text processing application software, and so on. Furthermore, there is still enough space to enhance the classification performance of the SA task. In this research, we experiment on a recognized Bengali dataset of 11,807 comments to find positive or negative sentiments. We employ five state-of-the-art transformer-based pretrained models, such as multilingual Bidirectional Encoder Representations from Transformers (mBERT), BanglaBERT, Bangla-Bert-Base, DistilmBERT, and XLM-RoBERTa-base (XLM-R-base), with tuning of the hyperparameters. After that, we propose a combined model named Transformer-ensemble that presents outstanding detection performance with an accuracy of 95.97% and an F1-score of 95.96% compared to the existing recent methods in the Bengali SA task.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100091"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000396/pdfft?md5=224e6ebbfc8811318218e54f481e4c76&pid=1-s2.0-S2949719124000396-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141847924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Samad Riaz , Amna Saghir , Muhammad Junaid Khan , Hassan Khan , Hamid Saeed Khan , M. Jaleed Khan
{"title":"TransLSTM: A hybrid LSTM-Transformer model for fine-grained suggestion mining","authors":"Samad Riaz , Amna Saghir , Muhammad Junaid Khan , Hassan Khan , Hamid Saeed Khan , M. Jaleed Khan","doi":"10.1016/j.nlp.2024.100089","DOIUrl":"10.1016/j.nlp.2024.100089","url":null,"abstract":"<div><p>Digital platforms on the internet are invaluable for collecting user feedback, suggestions, and opinions about various topics, such as company products and services. This data is instrumental in shaping business strategies, enhancing product development, and refining service delivery. Suggestion mining is a key task in natural language processing, which focuses on extracting and analysing suggestions from these digital sources. Initially, suggestion mining utilized manually crafted features, but recent advancements have highlighted the efficacy of deep learning models, which automatically learn features. Models like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Bidirectional Encoder Representations from Transformers (BERT) have been employed in this field. However, considering the relatively small datasets and the faster training time of LSTM compared to BERT, we introduce TransLSTM, a novel LSTM-Transformer hybrid model for suggestion mining. This model aims to automatically pinpoint and extract suggestions by harnessing both local and global text dependencies. It combines the sequential dependency handling of LSTM with the contextual interaction capabilities of the Transformer, thus effectively identifying and extracting suggestions. We evaluated our method against state-of-the-art approaches using the SemEval Task-9 dataset, a benchmark for suggestion mining. Our model shows promising performance, surpassing existing deep learning methods by 6.76% with an F1 score of 0.834 for SubTask A and 0.881 for SubTask B. Additionally, our paper presents an exhaustive literature review on suggestion mining from digital platforms, covering both traditional and state-of-the-art text classification techniques.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100089"},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000372/pdfft?md5=01d5468c4cb646548ed9ac72a0da2eb9&pid=1-s2.0-S2949719124000372-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141706671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive survey on answer generation methods using NLP","authors":"Prashant Upadhyay, Rishabh Agarwal, Sumeet Dhiman, Abhinav Sarkar, Saumya Chaturvedi","doi":"10.1016/j.nlp.2024.100088","DOIUrl":"10.1016/j.nlp.2024.100088","url":null,"abstract":"<div><p>Recent advancements in question-answering systems have significantly enhanced the capability of computers to understand and respond to queries in natural language. This paper presents a comprehensive review of the evolution of question answering systems, with a focus on the developments over the last few years. We examine the foundational aspects of a question answering framework, including question analysis, answer extraction, and passage retrieval. Additionally, we delve into the challenges that question answering systems encounter, such as the intricacies of question processing, the necessity of contextual data sources, and the complexities involved in real-time question answering. Our study categorizes existing question answering systems based on the types of questions they address, the nature of the answers they produce, and the various approaches employed to generate these answers. We also explore the distinctions between opinion-based, extraction-based, retrieval-based, and generative answer generation. The classification provides insight into the strengths and limitations of each method, paving the way for future innovations in the field. This review aims to offer a clear understanding of the current state of question answering systems and to identify the scaling needed to meet the rising expectations and demands of users for coherent and accurate automated responses in natural language.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100088"},"PeriodicalIF":0.0,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000360/pdfft?md5=57245c441a09df1168241bb40a6f9e06&pid=1-s2.0-S2949719124000360-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141623096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Yuan , Zongyang Ma , Aijun An , Jimmy Xiangji Huang
{"title":"Topic-aware response selection for dialog systems","authors":"Wei Yuan , Zongyang Ma , Aijun An , Jimmy Xiangji Huang","doi":"10.1016/j.nlp.2024.100087","DOIUrl":"https://doi.org/10.1016/j.nlp.2024.100087","url":null,"abstract":"<div><p>It is challenging for a persona-based chitchat system to return responses consistent with the dialog context and the persona of the agent. This particularly holds for a retrieval-based chitchat system that selects the most appropriate response from a set of candidates according to the dialog context and the persona of the agent. A persona usually has some dominant topics (e.g., <em>sports</em>, <em>music</em>). Adhering to these topics can enhance the consistency of responses. However, previous studies rarely explore the topical semantics of the agent’s persona in the chitchat system, which often fails to return responses coherent with the persona. In this paper, we propose a Topic-Aware Response Selection (TARS) model, capturing multi-grained matching between the dialog context and a response and also between the persona and a response at both the word and the topic levels, to select the appropriate topic-aware response from the pool of response candidates. Empirical results on the public persona-based empathetic conversation (PEC) data demonstrate the promising performance of the TARS model for response selection.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100087"},"PeriodicalIF":0.0,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000359/pdfft?md5=460e17e8ab71eeba6fb71be3795c94c0&pid=1-s2.0-S2949719124000359-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141541701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A modified Vector Space Model for semantic information retrieval","authors":"Callistus Ireneous Nakpih","doi":"10.1016/j.nlp.2024.100081","DOIUrl":"10.1016/j.nlp.2024.100081","url":null,"abstract":"<div><p>In this research, we present a modified Vector Space Model which focuses on the semantic relevance of words for retrieving documents. The modified VSM resolves the problem of the classical model performing only lexical matching of query terms to document terms for retrievals. This problem also restricts the classical model from retrieving documents that do not have exact match of query terms even if they are semantically relevant to the query. In the modified model, we introduced a Query Relevance Update technique, which pads the original query set with semantically relevant document terms for optimised semantic retrieval results. The modified model also includes a novel <span><math><mrow><mi>t</mi><mi>f</mi><mo>−</mo><mi>p</mi></mrow></math></span> which replaces the <span><math><mrow><mi>t</mi><mi>f</mi><mo>−</mo><mi>i</mi><mi>d</mi><mi>f</mi></mrow></math></span> technique of the classical VSM, which is used to compute the Term Frequency weights. The replacement of the <span><math><mrow><mi>t</mi><mi>f</mi><mo>−</mo><mi>i</mi><mi>d</mi><mi>f</mi></mrow></math></span> resolves the problem of the classical model penalising terms that occur across documents with the assumption that they are stop words, which in practice, there are usually such words which carry relevant semantic information for documents’ retrieval. We also extended the cosine similarity function with a proportionality weight <span><math><msub><mrow><mi>p</mi></mrow><mrow><mi>q</mi><mi>d</mi></mrow></msub></math></span>, which moderates biases for high frequency of terms in longer documents. The <span><math><msub><mrow><mi>p</mi></mrow><mrow><mi>q</mi><mi>d</mi></mrow></msub></math></span> ensures that the frequency of query terms including the updated ones are accounted for in proportionality with documents size for the overall ranking of documents. The simulated results reveal that, the modified VSM does achieve semantic retrieval of documents beyond lexical matching of query and document terms.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100081"},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000293/pdfft?md5=3a5de846966e83dc34ea6a2e3b7d202f&pid=1-s2.0-S2949719124000293-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141405572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cutting through the noise to motivate people: A comprehensive analysis of COVID-19 social media posts de/motivating vaccination","authors":"Ashiqur Rahman , Ehsan Mohammadi , Hamed Alhoori","doi":"10.1016/j.nlp.2024.100085","DOIUrl":"10.1016/j.nlp.2024.100085","url":null,"abstract":"<div><p>The COVID-19 pandemic exposed significant weaknesses in the healthcare information system. The overwhelming volume of misinformation on social media and other socioeconomic factors created extraordinary challenges to motivate people to take proper precautions and get vaccinated. In this context, our work explored a novel direction by analyzing an extensive dataset collected over two years, identifying the topics de/motivating the public about COVID-19 vaccination. We analyzed these topics based on time, geographic location, and political orientation. We noticed that while the motivating topics remain the same over time and geographic location, the demotivating topics change rapidly. We also identified that intrinsic motivation, rather than external mandate, is more advantageous to inspire the public. This study addresses scientific communication and public motivation in social media. It can help public health officials, policymakers, and social media platforms develop more effective messaging strategies to cut through the noise of misinformation and educate the public about scientific findings.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100085"},"PeriodicalIF":0.0,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000335/pdfft?md5=bda60786d7ac110df5894be0ee669f0e&pid=1-s2.0-S2949719124000335-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141392804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generating dynamic lip-syncing using target audio in a multimedia environment","authors":"Diksha Pawar, Prashant Borde, Pravin Yannawar","doi":"10.1016/j.nlp.2024.100084","DOIUrl":"https://doi.org/10.1016/j.nlp.2024.100084","url":null,"abstract":"<div><p>The presented research focuses on the challenging task of creating lip-sync facial videos that align with a specified target speech segment. A novel deep-learning model has been developed to produce precise synthetic lip movements corresponding to the speech extracted from an audio source. Consequently, there are instances where portions of the visual data may fall out of sync with the updated audio and this challenge is handled through, a novel strategy, leveraging insights from a robust lip-sync discriminator. Additionally, this study introduces fresh criteria and evaluation benchmarks for assessing lip synchronization in unconstrained videos. LipChanger demonstrates improved PSNR values, indicative of enhanced image quality. Furthermore, it exhibits highly accurate lip synthesis, as evidenced by lower LMD values and higher SSIM values. These outcomes suggest that the LipChanger approach holds significant potential for enhancing lip synchronization in talking face videos, resulting in more realistic lip movements. The proposed LipChanger model and its associated evaluation benchmarks show promise and could potentially contribute to advancements in lip-sync technology for unconstrained talking face videos.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100084"},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000323/pdfft?md5=84516d2e22e4420f113635a3914da66f&pid=1-s2.0-S2949719124000323-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141328741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zafaryab Rasool , Stefanus Kurniawan , Sherwin Balugo , Scott Barnett , Rajesh Vasa , Courtney Chesser , Benjamin M. Hampstead , Sylvie Belleville , Kon Mouzakis , Alex Bahar-Fuchs
{"title":"Evaluating LLMs on document-based QA: Exact answer selection and numerical extraction using CogTale dataset","authors":"Zafaryab Rasool , Stefanus Kurniawan , Sherwin Balugo , Scott Barnett , Rajesh Vasa , Courtney Chesser , Benjamin M. Hampstead , Sylvie Belleville , Kon Mouzakis , Alex Bahar-Fuchs","doi":"10.1016/j.nlp.2024.100083","DOIUrl":"https://doi.org/10.1016/j.nlp.2024.100083","url":null,"abstract":"<div><p>Document-based Question-Answering (QA) tasks are crucial for precise information retrieval. While some existing work focus on evaluating large language model’s (LLMs) performance on retrieving and answering questions from documents, assessing the LLMs performance on QA types that require exact answer selection from predefined options and numerical extraction is yet to be fully assessed. In this paper, we specifically focus on this underexplored context and conduct empirical analysis of LLMs (GPT-4 and GPT-3.5) on question types, including single-choice, yes–no, multiple-choice, and number extraction questions from documents. We use the CogTale dataset for evaluation, which provide human expert-tagged responses, offering a robust benchmark for precision and factual grounding. We found that LLMs, particularly GPT-4, can precisely answer many single-choice and yes–no questions given relevant context, demonstrating their efficacy in information retrieval tasks. However, their performance diminishes when confronted with multiple-choice and number extraction formats, lowering the overall performance of the models on this task, indicating that these models may not yet be sufficiently reliable for the task. This limits the applications of LLMs on applications demanding precise information extraction and inference from documents, such as meta-analysis tasks. Our work offers a framework for ongoing dataset evaluation, ensuring that LLM applications for information retrieval and document analysis continue to meet evolving standards.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100083"},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949719124000311/pdfft?md5=99895c63882405f8b66929d134da8f31&pid=1-s2.0-S2949719124000311-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141438456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emil Rijcken , Kalliopi Zervanou , Pablo Mosteiro , Floortje Scheepers , Marco Spruit , Uzay Kaymak
{"title":"Topic specificity: A descriptive metric for algorithm selection and finding the right number of topics","authors":"Emil Rijcken , Kalliopi Zervanou , Pablo Mosteiro , Floortje Scheepers , Marco Spruit , Uzay Kaymak","doi":"10.1016/j.nlp.2024.100082","DOIUrl":"10.1016/j.nlp.2024.100082","url":null,"abstract":"<div><p>Topic modeling is a prevalent task for discovering the latent structure of a corpus, identifying a set of topics that represent the underlying themes of the documents. Despite its popularity, issues with its evaluation metric, the coherence score, result in two common challenges: <em>algorithm selection</em> and <em>determining the number of topics</em>. To address these two issues, we propose the <em>topic specificity</em> metric, which captures the relative frequency of topic words in the corpus and is used as a proxy for the specificity of a word. In this work, we formulate the metric firstly. Secondly, we demonstrate that algorithms train topics at different specificity levels. This insight can be used to address algorithm selection as it allows users to distinguish and select algorithms with the desired specificity level. Lastly, we show a strictly positive monotonic correlation between the topic specificity and the number of topics for LDA, FLSA-W, NMF and LSI. This correlation can be used to address the selection of the number of topics, as it allows users to adjust the number of topics to their desired level. Moreover, our descriptive metric provides a new perspective to characterize topic models, allowing them to be understood better.</p></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"8 ","pages":"Article 100082"},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S294971912400030X/pdfft?md5=af15e6c29d867b39aae58eedf84c6eda&pid=1-s2.0-S294971912400030X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141406979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}