基于机器学习的文本数据分类与优化方案Naïve贝叶斯分类器

2018 IEEE World Symposium on Communication Engineering (WSCE) Pub Date : 2018-12-01 DOI:10.1109/WSCE.2018.8690536

Venkatesh, K. Ranjitha

{"title":"基于机器学习的文本数据分类与优化方案Naïve贝叶斯分类器","authors":"Venkatesh, K. Ranjitha","doi":"10.1109/WSCE.2018.8690536","DOIUrl":null,"url":null,"abstract":"Text classification is an essential advance in characteristic dialect processing. It very well may be performed utilizing different classification algorithms. Hadoop Map Reduce is widely utilized in text classification to perform classification on colossal measure of text data. However, Map Reduce required a ton of time to perform the tasks thereby increasing latency and since the data is distributed over the cluster it builds time and thus reducing processing speed. Also Hadoop utilizes long queue of code. Motivated by this, we propose a basic yet compelling machine learning method which uses Naïve Bayes classifier for text data. In Machine Learning approach, the classifier is built automatically by learning the properties of categories from a set of pre-defined training data. Hence, it can process complex furthermore, multi assortment information in dynamic situations. Here we propose a naïve bayes classifier which scales directly with number of indicators and data points which can be used for both binary and multiclass classification problems. We implemented the presented schemes using Machine Learning tool. The experimental results demonstrate the performance improvement in the classification technique.","PeriodicalId":276876,"journal":{"name":"2018 IEEE World Symposium on Communication Engineering (WSCE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Classification and Optimization Scheme for Text Data using Machine Learning Naïve Bayes Classifier\",\"authors\":\"Venkatesh, K. Ranjitha\",\"doi\":\"10.1109/WSCE.2018.8690536\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is an essential advance in characteristic dialect processing. It very well may be performed utilizing different classification algorithms. Hadoop Map Reduce is widely utilized in text classification to perform classification on colossal measure of text data. However, Map Reduce required a ton of time to perform the tasks thereby increasing latency and since the data is distributed over the cluster it builds time and thus reducing processing speed. Also Hadoop utilizes long queue of code. Motivated by this, we propose a basic yet compelling machine learning method which uses Naïve Bayes classifier for text data. In Machine Learning approach, the classifier is built automatically by learning the properties of categories from a set of pre-defined training data. Hence, it can process complex furthermore, multi assortment information in dynamic situations. Here we propose a naïve bayes classifier which scales directly with number of indicators and data points which can be used for both binary and multiclass classification problems. We implemented the presented schemes using Machine Learning tool. The experimental results demonstrate the performance improvement in the classification technique.\",\"PeriodicalId\":276876,\"journal\":{\"name\":\"2018 IEEE World Symposium on Communication Engineering (WSCE)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE World Symposium on Communication Engineering (WSCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WSCE.2018.8690536\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE World Symposium on Communication Engineering (WSCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSCE.2018.8690536","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

文本分类是特征方言处理的重要进展。它可以很好地执行使用不同的分类算法。Hadoop Map Reduce被广泛应用于文本分类中，对海量文本数据进行分类。然而，Map Reduce需要大量的时间来执行任务，从而增加了延迟，并且由于数据分布在集群上，它会增加时间，从而降低处理速度。Hadoop还利用了长队列的代码。受此启发，我们提出了一种基本但引人注目的机器学习方法，该方法使用Naïve贝叶斯分类器处理文本数据。在机器学习方法中，分类器是通过从一组预定义的训练数据中学习类别的属性来自动构建的。因此，它可以在动态情况下处理复杂的、多重的分类信息。在这里，我们提出了一个naïve贝叶斯分类器，它直接与指标和数据点的数量进行缩放，可以用于二元和多类分类问题。我们使用机器学习工具实现了所提出的方案。实验结果表明，该分类技术的性能有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classification and Optimization Scheme for Text Data using Machine Learning Naïve Bayes Classifier

Text classification is an essential advance in characteristic dialect processing. It very well may be performed utilizing different classification algorithms. Hadoop Map Reduce is widely utilized in text classification to perform classification on colossal measure of text data. However, Map Reduce required a ton of time to perform the tasks thereby increasing latency and since the data is distributed over the cluster it builds time and thus reducing processing speed. Also Hadoop utilizes long queue of code. Motivated by this, we propose a basic yet compelling machine learning method which uses Naïve Bayes classifier for text data. In Machine Learning approach, the classifier is built automatically by learning the properties of categories from a set of pre-defined training data. Hence, it can process complex furthermore, multi assortment information in dynamic situations. Here we propose a naïve bayes classifier which scales directly with number of indicators and data points which can be used for both binary and multiclass classification problems. We implemented the presented schemes using Machine Learning tool. The experimental results demonstrate the performance improvement in the classification technique.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE World Symposium on Communication Engineering (WSCE)

自引率

0.00%

发文量