能像人类一样编码的智能系统

Jaiwin Shah, Rishabh Jain, Vedant Jolly, D. Kalbande
{"title":"能像人类一样编码的智能系统","authors":"Jaiwin Shah, Rishabh Jain, Vedant Jolly, D. Kalbande","doi":"10.1109/INCET57972.2023.10169940","DOIUrl":null,"url":null,"abstract":"According to recent studies, a large number of data scientists spend most of their time on tasks like data cleaning and organizing data. They need to memorize big complex syntaxes for all the major tasks in the data science life cycle. Often these tasks are redundant. Therefore, we propose to build an intelligent system that enables data scientists to perform all the tedious and time-consuming tasks such as EDA, data cleansing, data preprocessing, data visualization, modeling, and data science lifecycle evaluation. Just state the logic of your query in natural language the system will automatically output all relevant Python code snippets. Existing applications involving the text-to-code generation and code search are limited and a lot of them do not work in non-ideal conditions. The reason behind it is the data set on which the existing models have been built. These datasets do not consider real-world factors such as slang, acronyms, and paraphrases. Therefore, a new dataset was created consisting of real-world user queries, representing the scenarios a user is most likely to face daily. We plan to build a logic-oriented system that only needs to convey the logic correctly in text in natural language. It saves a lot of time, allowing data scientists to spend most of their time building logic instead of focusing on code.","PeriodicalId":403008,"journal":{"name":"2023 4th International Conference for Emerging Technology (INCET)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Smart Intelligent System that can Code Like a Human Being\",\"authors\":\"Jaiwin Shah, Rishabh Jain, Vedant Jolly, D. Kalbande\",\"doi\":\"10.1109/INCET57972.2023.10169940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to recent studies, a large number of data scientists spend most of their time on tasks like data cleaning and organizing data. They need to memorize big complex syntaxes for all the major tasks in the data science life cycle. Often these tasks are redundant. Therefore, we propose to build an intelligent system that enables data scientists to perform all the tedious and time-consuming tasks such as EDA, data cleansing, data preprocessing, data visualization, modeling, and data science lifecycle evaluation. Just state the logic of your query in natural language the system will automatically output all relevant Python code snippets. Existing applications involving the text-to-code generation and code search are limited and a lot of them do not work in non-ideal conditions. The reason behind it is the data set on which the existing models have been built. These datasets do not consider real-world factors such as slang, acronyms, and paraphrases. Therefore, a new dataset was created consisting of real-world user queries, representing the scenarios a user is most likely to face daily. We plan to build a logic-oriented system that only needs to convey the logic correctly in text in natural language. It saves a lot of time, allowing data scientists to spend most of their time building logic instead of focusing on code.\",\"PeriodicalId\":403008,\"journal\":{\"name\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INCET57972.2023.10169940\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET57972.2023.10169940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

根据最近的研究,大量数据科学家将大部分时间花在数据清理和组织数据等任务上。他们需要记住数据科学生命周期中所有主要任务的复杂语法。这些任务往往是多余的。因此,我们建议建立一个智能系统,使数据科学家能够执行所有繁琐而耗时的任务,如EDA,数据清洗,数据预处理,数据可视化,建模和数据科学生命周期评估。只需用自然语言陈述查询的逻辑,系统将自动输出所有相关的Python代码片段。涉及文本到代码生成和代码搜索的现有应用程序是有限的,并且许多应用程序不能在非理想条件下工作。其背后的原因是现有模型所基于的数据集。这些数据集不考虑现实世界的因素,如俚语、缩写词和释义。因此,创建了一个由真实用户查询组成的新数据集,代表用户每天最可能面对的场景。我们计划构建一个面向逻辑的系统,只需要在自然语言的文本中正确地传达逻辑。它节省了大量时间,允许数据科学家将大部分时间用于构建逻辑,而不是专注于代码。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Smart Intelligent System that can Code Like a Human Being
According to recent studies, a large number of data scientists spend most of their time on tasks like data cleaning and organizing data. They need to memorize big complex syntaxes for all the major tasks in the data science life cycle. Often these tasks are redundant. Therefore, we propose to build an intelligent system that enables data scientists to perform all the tedious and time-consuming tasks such as EDA, data cleansing, data preprocessing, data visualization, modeling, and data science lifecycle evaluation. Just state the logic of your query in natural language the system will automatically output all relevant Python code snippets. Existing applications involving the text-to-code generation and code search are limited and a lot of them do not work in non-ideal conditions. The reason behind it is the data set on which the existing models have been built. These datasets do not consider real-world factors such as slang, acronyms, and paraphrases. Therefore, a new dataset was created consisting of real-world user queries, representing the scenarios a user is most likely to face daily. We plan to build a logic-oriented system that only needs to convey the logic correctly in text in natural language. It saves a lot of time, allowing data scientists to spend most of their time building logic instead of focusing on code.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信