{"title":"Improving the clarity of questions in Community Question Answering networks","authors":"Alireza Khabbazan, Ahmad Ali Abin, Viet-Vu Vu","doi":"10.1007/s10844-024-00847-y","DOIUrl":null,"url":null,"abstract":"<p>Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"30 1","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10844-024-00847-y","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.
期刊介绍:
The mission of the Journal of Intelligent Information Systems: Integrating Artifical Intelligence and Database Technologies is to foster and present research and development results focused on the integration of artificial intelligence and database technologies to create next generation information systems - Intelligent Information Systems.
These new information systems embody knowledge that allows them to exhibit intelligent behavior, cooperate with users and other systems in problem solving, discovery, access, retrieval and manipulation of a wide variety of multimedia data and knowledge, and reason under uncertainty. Increasingly, knowledge-directed inference processes are being used to:
discover knowledge from large data collections,
provide cooperative support to users in complex query formulation and refinement,
access, retrieve, store and manage large collections of multimedia data and knowledge,
integrate information from multiple heterogeneous data and knowledge sources, and
reason about information under uncertain conditions.
Multimedia and hypermedia information systems now operate on a global scale over the Internet, and new tools and techniques are needed to manage these dynamic and evolving information spaces.
The Journal of Intelligent Information Systems provides a forum wherein academics, researchers and practitioners may publish high-quality, original and state-of-the-art papers describing theoretical aspects, systems architectures, analysis and design tools and techniques, and implementation experiences in intelligent information systems. The categories of papers published by JIIS include: research papers, invited papters, meetings, workshop and conference annoucements and reports, survey and tutorial articles, and book reviews. Short articles describing open problems or their solutions are also welcome.