Automatic Question Tagging using Machine Learning and Deep learning Algorithms

Mihir Prajapati, Mitul Nakrani, Tarjni Vyas, Lata Gohil, Shivani Desai, S. Degadwala
{"title":"Automatic Question Tagging using Machine Learning and Deep learning Algorithms","authors":"Mihir Prajapati, Mitul Nakrani, Tarjni Vyas, Lata Gohil, Shivani Desai, S. Degadwala","doi":"10.1109/ICECA55336.2022.10009632","DOIUrl":null,"url":null,"abstract":"Stack Overflow is a well-known website which is utilized by nearly everyone who learns to code, share their knowledge and publicly participate in this question-answering forum. The questions posted on the Stack Overflow forum by a user requires a minimum of 1 tag to be manually entered in by them. Tagging most commonly means to associate some single word information about the context of given text or question. Tagging a question is useful in identifying the category that a question or text belongs. It is also beneficial in providing ease of access to a person having a requirement of specific categories of questions. On analysis of tags associated with the questions on the website, it was found that a large number of the questions are labelled by more than one tags, with many of them not being tagged accurately. Due to this situation, it becomes challenging for the users to search for relevant tags. So, the main aim of this research task is to explore methods and compare different techniques in order to create an auto tagging system with the aid of Machine learning and deep learning facilities, accompanied by data preprocessing steps. The dataset for this purpose was taken from Kaggle, known as StackSample dataset, which is a dataset containing 10 percent of the questions present on the website. The output of the research performed for this purpose provided satisfactory results with scope of improvement.","PeriodicalId":356949,"journal":{"name":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 6th International Conference on Electronics, Communication and Aerospace Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECA55336.2022.10009632","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Stack Overflow is a well-known website which is utilized by nearly everyone who learns to code, share their knowledge and publicly participate in this question-answering forum. The questions posted on the Stack Overflow forum by a user requires a minimum of 1 tag to be manually entered in by them. Tagging most commonly means to associate some single word information about the context of given text or question. Tagging a question is useful in identifying the category that a question or text belongs. It is also beneficial in providing ease of access to a person having a requirement of specific categories of questions. On analysis of tags associated with the questions on the website, it was found that a large number of the questions are labelled by more than one tags, with many of them not being tagged accurately. Due to this situation, it becomes challenging for the users to search for relevant tags. So, the main aim of this research task is to explore methods and compare different techniques in order to create an auto tagging system with the aid of Machine learning and deep learning facilities, accompanied by data preprocessing steps. The dataset for this purpose was taken from Kaggle, known as StackSample dataset, which is a dataset containing 10 percent of the questions present on the website. The output of the research performed for this purpose provided satisfactory results with scope of improvement.
使用机器学习和深度学习算法的自动问题标注
Stack Overflow是一个知名的网站,几乎每个学习编码的人都利用它来分享他们的知识,并公开参与这个问答论坛。用户在Stack Overflow论坛上发布的问题需要他们手动输入至少1个标签。标记最常见的意思是将一些关于给定文本或问题的上下文的单个单词信息联系起来。标记问题在确定问题或文本所属的类别时很有用。它还有助于为有特定类别问题需求的人提供方便的访问。通过对网站上问题相关标签的分析,发现大量问题被多个标签所标注,其中很多问题标注不准确。由于这种情况,用户搜索相关标签变得具有挑战性。因此,本研究任务的主要目的是探索方法和比较不同的技术,以便在机器学习和深度学习设施的帮助下创建一个自动标记系统,并伴有数据预处理步骤。用于此目的的数据集取自Kaggle,称为StackSample数据集,该数据集包含了网站上10%的问题。为此目的进行的研究输出提供了令人满意的结果,并有改进的余地。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信