区域/马拉地语分析的有效数据集准备技术:为区域语言/马拉地语文本分析创建自定义数据集

Sudashan Sirsat, Nitish Zulpe
{"title":"区域/马拉地语分析的有效数据集准备技术:为区域语言/马拉地语文本分析创建自定义数据集","authors":"Sudashan Sirsat, Nitish Zulpe","doi":"10.1109/SICTIM56495.2023.10104666","DOIUrl":null,"url":null,"abstract":"Regional language contents are the key to globalization of any successful internet based business model. Looking at the huge population interested in accessing the internet using their mother tongue or regional language is the new normal. This regional language contents on social media and word wide web pages fetched the attention of a large chunk of business analysts, data scientists and social reformists to understand the regional language sentiments through this humongous amount of regional language opinionated text. Regional Language Sentiment Analysis or Marathi language sentiment Analysis will be possible if one can create a dataset which can face text analytics language challenges like uniformity, syntactic and semantic challenges of regional language. This study is a small attempt to create a basic dataset capable of facing future Regional Language Sentiment Analysis or Marathi Language Sentiment Analysis based on NLP and SA based algorithmic approaches. This study will try to generate a Marathi language dataset from social media opinionated text and web scraping of a Marathi language webpage. All the technical issues associated with generating regional language or Marathi language dataset will be recorded, rectified and relatively refined through rigorous iterations to make the dataset future ready Marathi language sentiment analysis. This study will try to understand the needs of Regional Sentiment analysis requirements in terms of dataset, the best suitable file structure and efficient way of creating and customizing the Marathi text dataset in order to make it Natural Language Processing (NLP) and Sentiment Analysis SA ready for future studies in continuation.","PeriodicalId":244947,"journal":{"name":"2023 Somaiya International Conference on Technology and Information Management (SICTIM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Dataset Preparation Techniques for Regional/Marathi Language Analysis: Creating Customized Dataset for Regional Language/Marathi Language Text Analysis\",\"authors\":\"Sudashan Sirsat, Nitish Zulpe\",\"doi\":\"10.1109/SICTIM56495.2023.10104666\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regional language contents are the key to globalization of any successful internet based business model. Looking at the huge population interested in accessing the internet using their mother tongue or regional language is the new normal. This regional language contents on social media and word wide web pages fetched the attention of a large chunk of business analysts, data scientists and social reformists to understand the regional language sentiments through this humongous amount of regional language opinionated text. Regional Language Sentiment Analysis or Marathi language sentiment Analysis will be possible if one can create a dataset which can face text analytics language challenges like uniformity, syntactic and semantic challenges of regional language. This study is a small attempt to create a basic dataset capable of facing future Regional Language Sentiment Analysis or Marathi Language Sentiment Analysis based on NLP and SA based algorithmic approaches. This study will try to generate a Marathi language dataset from social media opinionated text and web scraping of a Marathi language webpage. All the technical issues associated with generating regional language or Marathi language dataset will be recorded, rectified and relatively refined through rigorous iterations to make the dataset future ready Marathi language sentiment analysis. This study will try to understand the needs of Regional Sentiment analysis requirements in terms of dataset, the best suitable file structure and efficient way of creating and customizing the Marathi text dataset in order to make it Natural Language Processing (NLP) and Sentiment Analysis SA ready for future studies in continuation.\",\"PeriodicalId\":244947,\"journal\":{\"name\":\"2023 Somaiya International Conference on Technology and Information Management (SICTIM)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 Somaiya International Conference on Technology and Information Management (SICTIM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SICTIM56495.2023.10104666\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Somaiya International Conference on Technology and Information Management (SICTIM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SICTIM56495.2023.10104666","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

地区性语言内容是任何成功的基于互联网的商业模式全球化的关键。看看有多少人对使用母语或当地语言上网感兴趣,这是一种新常态。社交媒体和网页上的这些地域语言内容引起了大量商业分析师、数据科学家和社会改革家的注意,他们希望通过这些海量的地域语言自以为是的文本来理解地域语言情绪。区域语言情感分析或马拉地语情感分析将成为可能,如果一个人可以创建一个数据集,可以面对文本分析语言的挑战,如区域语言的统一性、句法和语义挑战。本研究是一个小型尝试,旨在创建一个基本数据集,能够面对未来基于NLP和基于SA的算法方法的区域语言情感分析或马拉地语情感分析。本研究将尝试从一个马拉地语网页的社交媒体文本和网络抓取中生成一个马拉地语数据集。所有与生成区域语言或马拉地语数据集相关的技术问题都将通过严格的迭代进行记录、修正和相对完善,使数据集为未来的马拉地语情感分析做好准备。本研究将尝试了解区域情感分析在数据集方面的需求,最合适的文件结构以及创建和自定义马拉地语文本数据集的有效方法,以使其自然语言处理(NLP)和情感分析SA为未来的继续研究做好准备。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Efficient Dataset Preparation Techniques for Regional/Marathi Language Analysis: Creating Customized Dataset for Regional Language/Marathi Language Text Analysis
Regional language contents are the key to globalization of any successful internet based business model. Looking at the huge population interested in accessing the internet using their mother tongue or regional language is the new normal. This regional language contents on social media and word wide web pages fetched the attention of a large chunk of business analysts, data scientists and social reformists to understand the regional language sentiments through this humongous amount of regional language opinionated text. Regional Language Sentiment Analysis or Marathi language sentiment Analysis will be possible if one can create a dataset which can face text analytics language challenges like uniformity, syntactic and semantic challenges of regional language. This study is a small attempt to create a basic dataset capable of facing future Regional Language Sentiment Analysis or Marathi Language Sentiment Analysis based on NLP and SA based algorithmic approaches. This study will try to generate a Marathi language dataset from social media opinionated text and web scraping of a Marathi language webpage. All the technical issues associated with generating regional language or Marathi language dataset will be recorded, rectified and relatively refined through rigorous iterations to make the dataset future ready Marathi language sentiment analysis. This study will try to understand the needs of Regional Sentiment analysis requirements in terms of dataset, the best suitable file structure and efficient way of creating and customizing the Marathi text dataset in order to make it Natural Language Processing (NLP) and Sentiment Analysis SA ready for future studies in continuation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信