Jaiwin Shah, Rishabh Jain, Vedant Jolly, D. Kalbande
{"title":"Smart Intelligent System that can Code Like a Human Being","authors":"Jaiwin Shah, Rishabh Jain, Vedant Jolly, D. Kalbande","doi":"10.1109/INCET57972.2023.10169940","DOIUrl":null,"url":null,"abstract":"According to recent studies, a large number of data scientists spend most of their time on tasks like data cleaning and organizing data. They need to memorize big complex syntaxes for all the major tasks in the data science life cycle. Often these tasks are redundant. Therefore, we propose to build an intelligent system that enables data scientists to perform all the tedious and time-consuming tasks such as EDA, data cleansing, data preprocessing, data visualization, modeling, and data science lifecycle evaluation. Just state the logic of your query in natural language the system will automatically output all relevant Python code snippets. Existing applications involving the text-to-code generation and code search are limited and a lot of them do not work in non-ideal conditions. The reason behind it is the data set on which the existing models have been built. These datasets do not consider real-world factors such as slang, acronyms, and paraphrases. Therefore, a new dataset was created consisting of real-world user queries, representing the scenarios a user is most likely to face daily. We plan to build a logic-oriented system that only needs to convey the logic correctly in text in natural language. It saves a lot of time, allowing data scientists to spend most of their time building logic instead of focusing on code.","PeriodicalId":403008,"journal":{"name":"2023 4th International Conference for Emerging Technology (INCET)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET57972.2023.10169940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
According to recent studies, a large number of data scientists spend most of their time on tasks like data cleaning and organizing data. They need to memorize big complex syntaxes for all the major tasks in the data science life cycle. Often these tasks are redundant. Therefore, we propose to build an intelligent system that enables data scientists to perform all the tedious and time-consuming tasks such as EDA, data cleansing, data preprocessing, data visualization, modeling, and data science lifecycle evaluation. Just state the logic of your query in natural language the system will automatically output all relevant Python code snippets. Existing applications involving the text-to-code generation and code search are limited and a lot of them do not work in non-ideal conditions. The reason behind it is the data set on which the existing models have been built. These datasets do not consider real-world factors such as slang, acronyms, and paraphrases. Therefore, a new dataset was created consisting of real-world user queries, representing the scenarios a user is most likely to face daily. We plan to build a logic-oriented system that only needs to convey the logic correctly in text in natural language. It saves a lot of time, allowing data scientists to spend most of their time building logic instead of focusing on code.