2018 IEEE International Congress on Big Data (BigData Congress)最新文献

Diagnosis Recommendation Using Machine Learning Scientific Workflows 使用机器学习科学工作流程的诊断建议

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00018

Ishtiaq Ahmed, Shiyong Lu, Changxin Bai, F. Bhuyan

{"title":"Diagnosis Recommendation Using Machine Learning Scientific Workflows","authors":"Ishtiaq Ahmed, Shiyong Lu, Changxin Bai, F. Bhuyan","doi":"10.1109/BigDataCongress.2018.00018","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00018","url":null,"abstract":"Diagnosis recommendation plays a significant role in healthcare, where a clinician infers an optimal diagnosis for a patient. This problem has a major impact on improving patients’ quality of life. Existing machine learning techniques for solving this problem require many labeled instances, which are not readily available. To overcome this limitation, in this paper, we present a scientific workflow for representing a semisupervised clustering based diagnosis recommendation model. In this approach, initial clusters are formed from a labeled dataset; then imposing certain relative threshold to a cluster, frequent patterns and their corresponding labels are obtained. Subsequently, unlabeled instances are labeled by assigning them to the most similar clusters. Finally, we form clusters on the generated new datasets and recommend the diagnosis label by applying a certain minimum threshold. To evaluate our model, we perform extensive experiments on the i2b2 datasets and compared our proposed algorithms with the self-training and co-training methods. The experimental results show that our proposed algorithm outperforms the mentioned methods in most cases. The proposed workflow is implemented in the DATAVIEW system.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123125232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Stream Analytics and Adaptive Windows for Operational Mode Identification of Time-Varying Industrial Systems 时变工业系统运行模式识别的流分析和自适应窗口

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00042

A. Khodabakhsh, Ismail Ari, Mustafa Bakir, Serhat Murat Alagoz

引用次数: 2

Insights on Apache Spark Usage by Mining Stack Overflow Questions 通过挖掘堆栈溢出问题了解Apache Spark的使用情况

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00037

L. J. Rodríguez, Xiaoran Wang, Jilong Kuang

引用次数: 7

Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks 基于长短期记忆神经网络的短期交通预测

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00015

Zainab Abbas, A. Al-Shishtawy, Sarunas Girdzijauskas, Vladimir Vlassov

{"title":"Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks","authors":"Zainab Abbas, A. Al-Shishtawy, Sarunas Girdzijauskas, Vladimir Vlassov","doi":"10.1109/BigDataCongress.2018.00015","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00015","url":null,"abstract":"Short-term traffic prediction allows Intelligent Transport Systems to proactively respond to events before they happen. With the rapid increase in the amount, quality, and detail of traffic data, new techniques are required that can exploit the information in the data in order to provide better results while being able to scale and cope with increasing amounts of data and growing cities. We propose and compare three models for short-term road traffic density prediction based on Long Short-Term Memory (LSTM) neural networks. We have trained the models using real traffic data collected by Motorway Control System in Stockholm that monitors highways and collects flow and speed data per lane every minute from radar sensors. In order to deal with the challenge of scale and to improve prediction accuracy, we propose to partition the road network into road stretches and junctions, and to model each of the partitions with one or more LSTM neural networks. Our evaluation results show that partitioning of roads improves the prediction accuracy by reducing the root mean square error by the factor of 5. We show that we can reduce the complexity of LSTM network by limiting the number of input sensors, on average to 35% of the original number, without compromising the prediction accuracy.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124099310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Compile-Time Code Generation for Embedded Data-Intensive Query Languages 嵌入式数据密集型查询语言的编译时代码生成

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00008

L. Fegaras, Md Hasanuzzaman Noor

{"title":"Compile-Time Code Generation for Embedded Data-Intensive Query Languages","authors":"L. Fegaras, Md Hasanuzzaman Noor","doi":"10.1109/BigDataCongress.2018.00008","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00008","url":null,"abstract":"Many emerging Big Data programming environments, such as Spark and Flink, provide powerful APIs that are inspired by functional programming. However, because of the complexity involved in developing and fine-tuning data analysis applications using the provided APIs, many programmers prefer to use declarative languages, such as Hive and Spark SQL, to code their distributed applications. Unfortunately, current data analysis query languages, which are typically based on the relational model, cannot effectively capture the rich data types and computations required for complex data analysis applications. Furthermore, these query languages are not well-integrated with the host programming language, as they are based on an incompatible data model, and are checked for correctness at run-time, which results in a significantly longer program development time. To address these shortcomings, we introduce a new query language for data-intensive scalable computing, called DIQL, that is deeply embedded in Scala, and a query optimization framework that optimizes and translates DIQL queries to byte code at compile-time. In contrast to other query languages, our query embedding eliminates impedance mismatch as any Scala code can be seamlessly mixed with SQL-like syntax, without having to add any special declaration. DIQL supports nested collections and hierarchical data and allows query nesting at any place in a query. With DIQL, programmers can express complex data analysis tasks, such as PageRank and matrix factorization, using SQL-like syntax exclusively. The DIQL query optimizer can find any possible join in a query, including joins hidden across deeply nested queries, thus unnesting any form of query nesting. Currently, DIQL can run on three Big Data platforms: Apache Spark, Apache Flink, and Twitter's Cascading/Scalding.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126465455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Analysing Customer Engagement of Turkish Airlines Using Big Social Data 利用大社交数据分析土耳其航空公司的客户参与度

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00017

Fie Sternberg, Kasper Hedegaard Pedersen, Niklas Klve Ryelund, R. Mukkamala, Ravikiran Vatrapu

引用次数: 13

Big Web Colors: Analyzing the World Top Sites 大网络颜色:分析世界顶级网站

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00020

M. Marchiori, Giulio Rigoni

引用次数: 0

Performance Modeling and Task Scheduling in Distributed Graph Processing 分布式图处理中的性能建模与任务调度

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00025

Daniel Presser, Frank Siqueira, Fábio Reina

引用次数: 2

Incorporating Word Embedding into Cross-Lingual Topic Modeling 将词嵌入纳入跨语言主题建模

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00010

Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui

{"title":"Incorporating Word Embedding into Cross-Lingual Topic Modeling","authors":"Chia-Hsuan Chang, San-Yih Hwang, Tou-Hsiang Xui","doi":"10.1109/BigDataCongress.2018.00010","DOIUrl":"https://doi.org/10.1109/BigDataCongress.2018.00010","url":null,"abstract":"In this paper, we address the cross-lingual topic modeling, which is an important technique that enables global enterprises to detect and compare topic trends across global markets. Previous works in cross-lingual topic modeling have proposed methods that utilize parallel or comparable corpus in constructing the polylingual topic model. However, parallel or comparable corpus in many cases are not available. In this research, we incorporate techniques of mapping cross-lingual word space and the topic modeling (LDA) and propose two methods: Translated Corpus with LDA (TC-LDA) and Post Match LDA (PM-LDA). The cross-lingual word space mapping allows us to compare words of different languages, and LDA enables us to group words into topics. Both TC-LDA and PM-LDA do not need parallel or comparable corpus and hence have more applicable domains. The effectiveness of both methods is evaluated using UM-Corpus and WS-353. Our evaluation results indicate that both methods are able to identify similar documents written in different language. In addition, PM-LDA is shown to achieve better performance than TC-LDA, especially when document length is short.","PeriodicalId":177250,"journal":{"name":"2018 IEEE International Congress on Big Data (BigData Congress)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127541421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Sensor Data Based System-Level Anomaly Prediction for Smart Manufacturing 基于传感器数据的智能制造系统级异常预测

2018 IEEE International Congress on Big Data (BigData Congress) Pub Date : 2018-07-01 DOI: 10.1109/BigDataCongress.2018.00028

Jianwu Wang, Chen Liu, Meiling Zhu, Pei Guo, Yapeng Hu

引用次数: 17