{"title":"Automatic Question Tagging using k-Nearest Neighbors and Random Forest","authors":"Virik Jain, Jash Lodhavia","doi":"10.1109/ISCV49265.2020.9204309","DOIUrl":null,"url":null,"abstract":"Stack Overflow is one of the most widely used platforms for asking questions and queries on topics related to computer science, software development and general computer programming. Tagging of the questions is particularly useful for indexing information based on the tags. Currently, a user enters the tag manually for a question asked by him/her. The question should contain at least one tag manually typed by the user. It can be seen that most of the questions asked should either have more tags associated with it or aren’t tagged accurately and appropriately. Since there are a huge number of tags, the process of searching through all the tags manually and find relevant ones can be cumbersome and is therefore overlooked by most of the users asking the questions. This research is focused on exploring methods for developing an autonomous tagging system using Machine learning methods like k-Nearest Neighbors and Random Forest along with some crucial data preprocessing steps like Stemming, Tokenization and removing Stop words. The dataset for the above research is taken from kaggle.com which has a 10% Stackoverflow question dataset open for all. The results of the following proposed system for automatic tagging were satisfactory. Random Forest gave an average percentage accuracy of 70% across all the tags while k-Nearest Neighbors performed slightly better giving an accuracy of 75%.","PeriodicalId":313743,"journal":{"name":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Intelligent Systems and Computer Vision (ISCV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCV49265.2020.9204309","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Stack Overflow is one of the most widely used platforms for asking questions and queries on topics related to computer science, software development and general computer programming. Tagging of the questions is particularly useful for indexing information based on the tags. Currently, a user enters the tag manually for a question asked by him/her. The question should contain at least one tag manually typed by the user. It can be seen that most of the questions asked should either have more tags associated with it or aren’t tagged accurately and appropriately. Since there are a huge number of tags, the process of searching through all the tags manually and find relevant ones can be cumbersome and is therefore overlooked by most of the users asking the questions. This research is focused on exploring methods for developing an autonomous tagging system using Machine learning methods like k-Nearest Neighbors and Random Forest along with some crucial data preprocessing steps like Stemming, Tokenization and removing Stop words. The dataset for the above research is taken from kaggle.com which has a 10% Stackoverflow question dataset open for all. The results of the following proposed system for automatic tagging were satisfactory. Random Forest gave an average percentage accuracy of 70% across all the tags while k-Nearest Neighbors performed slightly better giving an accuracy of 75%.