2018 IEEE International Congress on Big Data (BigData Congress)最新文献

筛选
英文 中文
Diagnosis Recommendation Using Machine Learning Scientific Workflows 使用机器学习科学工作流程的诊断建议
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00018
Ishtiaq Ahmed, Shiyong Lu, Changxin Bai, F. Bhuyan
{"title":"Diagnosis Recommendation Using Machine Learning Scientific Workflows","authors":"Ishtiaq Ahmed, Shiyong Lu, Changxin Bai, F. Bhuyan","doi":"10.1109/BigDataCongress.2018.00018","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00018","url":null,"abstract":"Diagnosis recommendation plays a significant role in healthcare, where a clinician infers an optimal diagnosis for a patient. This problem has a major impact on improving patients’ quality of life. Existing machine learning techniques for solving this problem require many labeled instances, which are not readily available. To overcome this limitation, in this paper, we present a scientific workflow for representing a semisupervised clustering based diagnosis recommendation model. In this approach, initial clusters are formed from a labeled dataset; then imposing certain relative threshold to a cluster, frequent patterns and their corresponding labels are obtained. Subsequently, unlabeled instances are labeled by assigning them to the most similar clusters. Finally, we form clusters on the generated new datasets and recommend the diagnosis label by applying a certain minimum threshold. To evaluate our model, we perform extensive experiments on the i2b2 datasets and compared our proposed algorithms with the self-training and co-training methods. The experimental results show that our proposed algorithm outperforms the mentioned methods in most cases. The proposed workflow is implemented in the DATAVIEW system.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123125232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Stream Analytics and Adaptive Windows for Operational Mode Identification of Time-Varying Industrial Systems 时变工业系统运行模式识别的流分析和自适应窗口
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00042
A. Khodabakhsh, Ismail Ari, Mustafa Bakir, Serhat Murat Alagoz
{"title":"Stream Analytics and Adaptive Windows for Operational Mode Identification of Time-Varying Industrial Systems","authors":"A. Khodabakhsh, Ismail Ari, Mustafa Bakir, Serhat Murat Alagoz","doi":"10.1109/BigDataCongress.2018.00042","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00042","url":null,"abstract":"It is necessary to develop accurate, yet simple and efficient models that can be used with high-speed industrial data streams. In this paper, we develop a mode identification technique using stream analytics and show that it may be more effective than batch models, especially for time-varying systems. These industrial systems continuously monitor hundreds of sensors, but the relationships among variables change over time, which are identified as different operational modes. To detect drifts among modes, predictive modeling techniques such as regression analysis, K-means and DBSCAN clustering are used over sensor data streams from an oil refinery and models are updated in real-time using window-based analysis. Finally, an adaptive window size tuning approach based on the TCP congestion control algorithm is discussed, which reduces model update costs as well as prediction errors.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128651211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Insights on Apache Spark Usage by Mining Stack Overflow Questions 通过挖掘堆栈溢出问题了解Apache Spark的使用情况
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00037
L. J. Rodríguez, Xiaoran Wang, Jilong Kuang
{"title":"Insights on Apache Spark Usage by Mining Stack Overflow Questions","authors":"L. J. Rodríguez, Xiaoran Wang, Jilong Kuang","doi":"10.1109/BigDataCongress.2018.00037","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00037","url":null,"abstract":"Apache Spark is one of the most popular big data tools. Despite its popularity, there are no studies regarding its overall usage among software developers. As such, essential questions remain unanswered. For instance, it is not known what the common issues faced by Spark users are, what the most popular Spark libraries are, or what technologies are most commonly used together with Spark. In this paper, we mine Stack Overflow questions and try to shed some light into the above issues. Specifically, we first apply Latent Dirichlet Allocation (LDA) to Stack Overflow questions and obtain the main topics of discussion. By computing previously proposed metrics and a novel modification, we provide insights into Spark usage while taking question view count into account. Further insights are then given by applying newly proposed metrics to the question tags. Temporal trends are finally discussed after analyzing the proposed metrics over time.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129154486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks 基于长短期记忆神经网络的短期交通预测
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00015
Zainab Abbas, A. Al-Shishtawy, Sarunas Girdzijauskas, Vladimir Vlassov
{"title":"Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks","authors":"Zainab Abbas, A. Al-Shishtawy, Sarunas Girdzijauskas, Vladimir Vlassov","doi":"10.1109/BigDataCongress.2018.00015","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00015","url":null,"abstract":"Short-term traffic prediction allows Intelligent Transport Systems to proactively respond to events before they happen. With the rapid increase in the amount, quality, and detail of traffic data, new techniques are required that can exploit the information in the data in order to provide better results while being able to scale and cope with increasing amounts of data and growing cities. We propose and compare three models for short-term road traffic density prediction based on Long Short-Term Memory (LSTM) neural networks. We have trained the models using real traffic data collected by Motorway Control System in Stockholm that monitors highways and collects flow and speed data per lane every minute from radar sensors. In order to deal with the challenge of scale and to improve prediction accuracy, we propose to partition the road network into road stretches and junctions, and to model each of the partitions with one or more LSTM neural networks. Our evaluation results show that partitioning of roads improves the prediction accuracy by reducing the root mean square error by the factor of 5. We show that we can reduce the complexity of LSTM network by limiting the number of input sensors, on average to 35% of the original number, without compromising the prediction accuracy.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Compile-Time Code Generation for Embedded Data-Intensive Query Languages 嵌入式数据密集型查询语言的编译时代码生成
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00008
L. Fegaras, Md Hasanuzzaman Noor
{"title":"Compile-Time Code Generation for Embedded Data-Intensive Query Languages","authors":"L. Fegaras, Md Hasanuzzaman Noor","doi":"10.1109/BigDataCongress.2018.00008","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00008","url":null,"abstract":"Many emerging Big Data programming environments, such as Spark and Flink, provide powerful APIs that are inspired by functional programming. However, because of the complexity involved in developing and fine-tuning data analysis applications using the provided APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, current data analysis query languages, which are typically based on the relational model, cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model, and are checked for correctness at run-time, which results in a significantly longer program development time. To address these shortcomings, we introduce a new query language for data-intensive scalable computing, called DIQL, that is deeply embedded in Scala, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer can find any possible join in a query, including joins hidden across deeply nested queries, thus unnesting any form of query nesting. Currently, DIQL can run on three Big Data platforms: Apache Spark, Apache Flink, and Twitter's Cascading/Scalding.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Analysing Customer Engagement of Turkish Airlines Using Big Social Data 利用大社交数据分析土耳其航空公司的客户参与度
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00017
Fie Sternberg, Kasper Hedegaard Pedersen, Niklas Klve Ryelund, R. Mukkamala, Ravikiran Vatrapu
{"title":"Analysing Customer Engagement of Turkish Airlines Using Big Social Data","authors":"Fie Sternberg, Kasper Hedegaard Pedersen, Niklas Klve Ryelund, R. Mukkamala, Ravikiran Vatrapu","doi":"10.1109/BigDataCongress.2018.00017","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00017","url":null,"abstract":"Companies started taking advantage of the unlocked potential of Big Social Data, however, research on airlines’ use of social media is limited. This research aims to investigate to what extent Turkish Airlines can utilize their Facebook page to improve performance metrics. This study will exploit the concepts of Big Social Data, customer satisfaction, sentiment analysis to answer the research questions by employing dataand text mining, machine learning. The results showed a weak relationship between the business data and Facebook data, however, the findings provided explanations to customer behavior and showed that most of the company’s Facebook users were likely to purchase a Turkish Airline ticket. Therefore, Turkish Airlines could utilize their Facebook page in the short-term to improve revenue-generating indicators such as customer satisfaction and likelihood of purchase.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133876642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Big Web Colors: Analyzing the World Top Sites 大网络颜色:分析世界顶级网站
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00020
M. Marchiori, Giulio Rigoni
{"title":"Big Web Colors: Analyzing the World Top Sites","authors":"M. Marchiori, Giulio Rigoni","doi":"10.1109/BigDataCongress.2018.00020","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00020","url":null,"abstract":"Colors are obviously important for web sites, but how much? in this paper we try to study the problem of abstracting from the actual content, and analyze if and how colors in images have a higher-level fundamental importance. Focusing on the world top web sites, we collected a large pool (almost two millions) of images, and then investigated the relationships of colors with the attractiveness of a page. Can colors alone boost the success of a page, and in what terms? To answer this question we developed an experiment involving a large number of people, measuring how and how much colors affect a page, abstracting from the content. The results show that, rather surprisingly, colors do have a more fundamental significance that can be decoupled from the underlying shapes. We provide qualitative and quantitative insights on how important colors are, and how they actually impact the success of a site in terms of user perception.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133979725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Modeling and Task Scheduling in Distributed Graph Processing 分布式图处理中的性能建模与任务调度
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00025
Daniel Presser, Frank Siqueira, Fábio Reina
{"title":"Performance Modeling and Task Scheduling in Distributed Graph Processing","authors":"Daniel Presser, Frank Siqueira, Fábio Reina","doi":"10.1109/BigDataCongress.2018.00025","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00025","url":null,"abstract":"The accelerated growth of datasets observed in modern applications also applies to datasets modeled as graphs. To handle this problem, several large scale distributed graph processing models have been proposed, such as Pregel. These systems are designed to run in large clusters, where the resources must be allocated efficiently. In this paper we present a prediction model and a scheduler for Pregel-based distributed graph processing jobs. The jobs are treated as moldable tasks by the scheduler that, based on the predictions, allocates the best number of workers to each job in order to minimize makespan. Experimental results show that the prediction model has accuracy close to 90%, allowing the scheduler to work within the theoretical approximation limits of the optimal makespan.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131779166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Incorporating Word Embedding into Cross-Lingual Topic Modeling 将词嵌入纳入跨语言主题建模
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00010
Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui
{"title":"Incorporating Word Embedding into Cross-Lingual Topic Modeling","authors":"Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui","doi":"10.1109/BigDataCongress.2018.00010","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00010","url":null,"abstract":"In this paper, we address the cross-lingual topic modeling, which is an important technique that enables global enterprises to detect and compare topic trends across global markets. Previous works in cross-lingual topic modeling have proposed methods that utilize parallel or comparable corpus in constructing the polylingual topic model. However, parallel or comparable corpus in many cases are not available. In this research, we incorporate techniques of mapping cross-lingual word space and the topic modeling (LDA) and propose two methods: Translated Corpus with LDA (TC-LDA) and Post Match LDA (PM-LDA). The cross-lingual word space mapping allows us to compare words of different languages, and LDA enables us to group words into topics. Both TC-LDA and PM-LDA do not need parallel or comparable corpus and hence have more applicable domains. The effectiveness of both methods is evaluated using UM-Corpus and WS-353. Our evaluation results indicate that both methods are able to identify similar documents written in different language. In addition, PM-LDA is shown to achieve better performance than TC-LDA, especially when document length is short.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127541421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Sensor Data Based System-Level Anomaly Prediction for Smart Manufacturing 基于传感器数据的智能制造系统级异常预测
2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00028
Jianwu Wang, Chen Liu, Meiling Zhu, Pei Guo, Yapeng Hu
{"title":"Sensor Data Based System-Level Anomaly Prediction for Smart Manufacturing","authors":"Jianwu Wang, Chen Liu, Meiling Zhu, Pei Guo, Yapeng Hu","doi":"10.1109/BigDataCongress.2018.00028","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00028","url":null,"abstract":"With the popularity of Supervisory Information System (SIS), Supervisory Control and Data Acquisition (SCADA) system and Internet of Things (IoT) sensors, we can easily obtain abundant sensor data in manufacturing. We could save manufacturing maintenance costs and prevent further damages if we can accurately predict system anomalies from the sensor data. Yet learning from individual sensors often cannot directly determine whether the system will have anomaly because each sensor only measures a partial state of a big system. By detecting events across sensors collectively and their temporal dependencies, this paper proposes a new system-level anomaly prediction framework by mining anomaly dependency graph from sensor data. The advantages of the approach include explainability, collective prediction and temporal sensitivity. We applied our approach with a real-world power plant dataset to evaluate its feasibility.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信