基于数据挖掘的网上购物服务水平比较

International Journal of Data Mining Techniques and Applications Pub Date : 2016-06-15 DOI:10.20894/IJDMTA.102.005.001.005

K. Chandini, A. Roshini, A. Kokila, B. Aishwarya

{"title":"基于数据挖掘的网上购物服务水平比较","authors":"K. Chandini, A. Roshini, A. Kokila, B. Aishwarya","doi":"10.20894/IJDMTA.102.005.001.005","DOIUrl":null,"url":null,"abstract":"The term knowledge discovery in databases (KDD) is the analysis step of data mining. The data mining goal is to extract the knowledge and patterns from large data sets, not the data extraction itself. Big-Data Computing is a critical challenge for the ICT industry. Engineers and researchers are dealing with the cloud computing paradigm of petabyte data sets. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. We investigate the problem for a single source node to broadcast the big chunk of data sets to a set of nodes to minimize the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed data centers. The Big-data broadcasting problem is modeled into a LockStep Broadcast Tree (LSBT) problem. And the main idea of the LSBT is defining a basic unit of upload bandwidth, r, a node with capacity c broadcasts data to a set of [c=r] children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. The broadcast data are further divided into m chunks. In a pipeline manner, these m chunks can then be broadcast down the LSBT. In a homogeneous network environment in which each node has the same upload capacity c, the optimal uplink rate r, of LSBT is either c=2 or 3, whichever gives the smaller maximum completion time. For heterogeneous environments, an O(nlog2n) algorithm is presented to select an optimal uplink rate r, and to construct an optimal LSBT. With lower computational complexity and low maximum completion time, the numerical results shows better performance.The methodology includes Various Web applications Building and Broadcasting followed by the Gateway Application and Batch Processing over the TSV Data after which the Web Crawling for Resources and MapReduce process takes place and finally Picking Products from Recommendations and Purchasing it.","PeriodicalId":414709,"journal":{"name":"International Journal of Data Mining Techniques and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Service Level Comparison for Online Shopping using Data Mining\",\"authors\":\"K. Chandini, A. Roshini, A. Kokila, B. Aishwarya\",\"doi\":\"10.20894/IJDMTA.102.005.001.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The term knowledge discovery in databases (KDD) is the analysis step of data mining. The data mining goal is to extract the knowledge and patterns from large data sets, not the data extraction itself. Big-Data Computing is a critical challenge for the ICT industry. Engineers and researchers are dealing with the cloud computing paradigm of petabyte data sets. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. We investigate the problem for a single source node to broadcast the big chunk of data sets to a set of nodes to minimize the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed data centers. The Big-data broadcasting problem is modeled into a LockStep Broadcast Tree (LSBT) problem. And the main idea of the LSBT is defining a basic unit of upload bandwidth, r, a node with capacity c broadcasts data to a set of [c=r] children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. The broadcast data are further divided into m chunks. In a pipeline manner, these m chunks can then be broadcast down the LSBT. In a homogeneous network environment in which each node has the same upload capacity c, the optimal uplink rate r, of LSBT is either c=2 or 3, whichever gives the smaller maximum completion time. For heterogeneous environments, an O(nlog2n) algorithm is presented to select an optimal uplink rate r, and to construct an optimal LSBT. With lower computational complexity and low maximum completion time, the numerical results shows better performance.The methodology includes Various Web applications Building and Broadcasting followed by the Gateway Application and Batch Processing over the TSV Data after which the Web Crawling for Resources and MapReduce process takes place and finally Picking Products from Recommendations and Purchasing it.\",\"PeriodicalId\":414709,\"journal\":{\"name\":\"International Journal of Data Mining Techniques and Applications\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Mining Techniques and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.20894/IJDMTA.102.005.001.005\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining Techniques and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20894/IJDMTA.102.005.001.005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据库中的知识发现(KDD)是数据挖掘的分析步骤。数据挖掘的目标是从大型数据集中提取知识和模式，而不是数据提取本身。大数据计算是ICT行业面临的重大挑战。工程师和研究人员正在处理pb数据集的云计算范式。因此，构建服务堆栈来分发、管理和处理大量数据集的需求急剧上升。我们研究了单个源节点将大块数据集广播到一组节点以最小化最大完成时间的问题。这些节点可能位于同一数据中心，也可能位于地理分布的数据中心之间。将大数据广播问题建模为LockStep广播树(LSBT)问题。LSBT的主要思想是定义一个上传带宽的基本单位r，一个容量为c的节点以速率r向一组[c=r]子节点广播数据。注意，r是作为LSBT问题的一部分需要优化的参数。将广播数据进一步划分为m个块。然后，这些块可以通过管道的方式在LSBT上广播。在同质网络环境下，各节点具有相同的上传容量c，则LSBT的最优上行速率r为c=2或3，二者最大完成时间较小。对于异构环境，提出了一种O(nlog2n)算法来选择最优上行速率r，并构造最优LSBT。数值结果具有较低的计算复杂度和较短的最大完成时间，具有较好的性能。该方法包括各种Web应用程序的构建和广播，然后是网关应用程序和TSV数据的批处理，之后进行资源的网络爬行和MapReduce过程，最后从推荐中挑选产品并购买它。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Service Level Comparison for Online Shopping using Data Mining

The term knowledge discovery in databases (KDD) is the analysis step of data mining. The data mining goal is to extract the knowledge and patterns from large data sets, not the data extraction itself. Big-Data Computing is a critical challenge for the ICT industry. Engineers and researchers are dealing with the cloud computing paradigm of petabyte data sets. Thus the demand for building a service stack to distribute, manage and process massive data sets has risen drastically. We investigate the problem for a single source node to broadcast the big chunk of data sets to a set of nodes to minimize the maximum completion time. These nodes may locate in the same datacenter or across geo-distributed data centers. The Big-data broadcasting problem is modeled into a LockStep Broadcast Tree (LSBT) problem. And the main idea of the LSBT is defining a basic unit of upload bandwidth, r, a node with capacity c broadcasts data to a set of [c=r] children at the rate r. Note that r is a parameter to be optimized as part of the LSBT problem. The broadcast data are further divided into m chunks. In a pipeline manner, these m chunks can then be broadcast down the LSBT. In a homogeneous network environment in which each node has the same upload capacity c, the optimal uplink rate r, of LSBT is either c=2 or 3, whichever gives the smaller maximum completion time. For heterogeneous environments, an O(nlog2n) algorithm is presented to select an optimal uplink rate r, and to construct an optimal LSBT. With lower computational complexity and low maximum completion time, the numerical results shows better performance.The methodology includes Various Web applications Building and Broadcasting followed by the Gateway Application and Batch Processing over the TSV Data after which the Web Crawling for Resources and MapReduce process takes place and finally Picking Products from Recommendations and Purchasing it.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Mining Techniques and Applications

自引率

0.00%

发文量