2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)最新文献_第4页

DNA: General Deterministic Network Adaptive Framework for Multi-Round Multi-Party Influence Maximization DNA:多轮多方影响最大化的一般确定性网络自适应框架

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00038

Tzu-Hsin Yang, Hao-Shang Ma, Jen-Wei Huang

{"title":"DNA: General Deterministic Network Adaptive Framework for Multi-Round Multi-Party Influence Maximization","authors":"Tzu-Hsin Yang, Hao-Shang Ma, Jen-Wei Huang","doi":"10.1109/DSAA.2018.00038","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00038","url":null,"abstract":"The influence maximization problem has been considered a vital problem when companies provide similar products or services. Since there are limited resources, companies must determine a strategy to occupy as much market share as possible. In this paper, we propose a general Deterministic Network Adaptive (DNA) framework to solve the multi-round multi-party influence maximization problem. To obtain the most market share, using one single strategy to determine seed nodes is not sufficient in the long term. The reason is that the network status changes during the multi-round procedure. The strategies of selecting seed nodes in each round should depend on the current status of influence diffusion in the network. DNA framework leverages the concept of reinforcement learning to maximize the expected cumulative influence. In addition, the learning process is deterministic, so that it does not take time to explore the spaces that are less important. We further design a similarity function to measure the similarity between two networks. DNA framework can avoid redundant computation when the similar networks have been trained before. Moreover, we propose the method to make the policy decision to maximize the influence spread in coopetition scenario based on DNA framework. The proposed framework is evaluated with synthetic data and real-world data. From the experimental results, DNA framework outperforms the existing works in influence maximization problems. The coopetition policy which is generated by DNA has the best performance in most cases.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126293333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Using Data Analytics to Optimize Public Transportation on a College Campus 使用数据分析优化大学校园的公共交通

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00059

K. Zimmer, H. Kurban, Mark Jenne, Logan Keating, P. Maull, Mehmet M. Dalkilic

引用次数: 7

Willingness to Share Emotion Information on Social Media: Influence of Personality and Social Context 在社交媒体上分享情绪信息的意愿:个性和社会情境的影响

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00086

Damien Dupré, G. McKeown, Nicole Andelic, Gawain Morrison

引用次数: 2

Coolabilities API

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00063

D. Nordfors, S. Dasgupta, Ganapathy Subramanian, V. R. Ferose, Chally Grundwag, Behrang Zandi

引用次数: 2

Big Data-Driven Platform for Cross-Media Monitoring 大数据驱动的跨媒体监控平台

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00051

L. Napalkova, Pablo Aragón, Juan Carlos Castro Robles

{"title":"Big Data-Driven Platform for Cross-Media Monitoring","authors":"L. Napalkova, Pablo Aragón, Juan Carlos Castro Robles","doi":"10.1109/DSAA.2018.00051","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00051","url":null,"abstract":"The abundance of online media content requires highly scalable architectures to allow cross-media monitoring. This paper presents an innovative big data-as-a-service platform for analysing large complex networks in order to enhance cross-media monitoring. In contrast to the existing media monitoring systems, the platform equips marketers with several distinctive features. First, while most of the systems perform quantitative exploratory analysis of social media, our platform applies graph analytics in order to reveal social interaction types, hidden patterns in the cross-media network and the information diffusion over time. Second, our platform integrates and implements distributed versions of graph analytics algorithms (Louvain, HITS and others) that can scale to a large volume of data. Third, the creation of cross-media graphs is triggered by user-defined queries that can be easily specified by marketers. Thus, end-users can build and analyse different graphs according to specific goals of the study. Finally, the platform allows reducing Hadoop cluster usage costs due to executing the graph mining algorithms on demand triggered by user-defined queries. Instead of running costly streaming processes that continuously listen for new queries, we implemented Spark-as-a-service approach via Apache Livy REST interface.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123960811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Scalable and Interpretable Predictive Models for Electronic Health Records 电子健康记录的可扩展和可解释的预测模型

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00045

Amela Fejza, P. Genevès, Nabil Layaïda, J. Bosson

{"title":"Scalable and Interpretable Predictive Models for Electronic Health Records","authors":"Amela Fejza, P. Genevès, Nabil Layaïda, J. Bosson","doi":"10.1109/DSAA.2018.00045","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00045","url":null,"abstract":"Early identification of patients at risk of developing complications during their hospital stay is currently one of the most challenging issues in healthcare. Complications include hospital-acquired infections, admissions to intensive care units, and in-hospital mortality. Being able to accurately predict the patients' outcomes is a crucial prerequisite for tailoring the care that certain patients receive, if it is believed that they will do poorly without additional intervention. We consider the problem of complication risk prediction, such as inpatient mortality, from the electronic health records of the patients. We study the question of making predictions on the first day at the hospital, and of making updated mortality predictions day after day during the patient's stay. We develop distributed models that are scalable and interpretable. Key insights include analysing diagnoses known at admission and drugs served, which evolve during the hospital stay. We leverage a distributed architecture to learn interpretable models from training datasets of gigantic size. We test our analyses with more than one million of patients from hundreds of hospitals, and report on the lessons learned from these experiments.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125004110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Latent Dirichlet Allocation in Discovering Goals in Patients Undergoing Bladder Cancer Surgery 潜在狄利克雷分配在膀胱癌手术患者目标发现中的作用

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00069

T. Atkinson

{"title":"Latent Dirichlet Allocation in Discovering Goals in Patients Undergoing Bladder Cancer Surgery","authors":"T. Atkinson","doi":"10.1109/DSAA.2018.00069","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00069","url":null,"abstract":"As we begin to leverage Big Data in health care settings and particularly in assessing patient-reported outcomes, there is a need for novel analytics to address unique challenges. One such challenge is in coding transcribed interview data, typically free-text entries of statements made by interviewees during face-to-face interviews. Conventional coding of such qualitative data into themes is labor-intensive and prone to inconsistencies. Latent Dirichlet Allocation (LDA) may offer statistical rigor in summarizing patients' concerns and coping strategies in a life-threatening illness. We aim to apply LDA to interview data collected as part of a prospective, longitudinal study of QOL in patients undergoing radical cystectomy and urinary diversion for bladder cancer. LDA showed that, prior to surgery, patients' priorities were primarily in cancer surgery and recovery. Six months after the surgery, however, their goals shifted to a desire to spend more time with family, resume work, and enjoy life to its fullest extent. Novel analytics such as LDA offer the possibility of summarizing personal goals in real time without the need for conventional fixed-length measures and qualitative data coding.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124588436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Predicting Worker Disagreement for More Effective Crowd Labeling 预测工人不同意更有效的群体标签

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00028

Stefan Räbiger, Gizem Gezici, Y. Saygin, M. Spiliopoulou

{"title":"Predicting Worker Disagreement for More Effective Crowd Labeling","authors":"Stefan Räbiger, Gizem Gezici, Y. Saygin, M. Spiliopoulou","doi":"10.1109/DSAA.2018.00028","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00028","url":null,"abstract":"Crowdsourcing is a popular mechanism used for labeling tasks to produce large corpora for training. However, producing a reliable crowd labeled training corpus is challenging and resource consuming. Research on crowdsourcing has shown that label quality is much affected by worker engagement and expertise. In this study, we postulate that label quality can also be affected by inherent ambiguity of the documents to be labeled. Such ambiguities are not known in advance, of course, but, once encountered by the workers, they lead to disagreement in the labeling – a disagreement that cannot be resolved by employing more workers. To deal with this problem, we propose a crowd labeling framework: we train a disagreement predictor on a small seed of documents, and then use this predictor to decide which documents of the complete corpus should be labeled and which should be checked for document-inherent ambiguities before assigning (and potentially wasting) worker effort on them. We report on the findings of the experiments we conducted on crowdsourcing a Twitter corpus for sentiment classification.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121165501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Parallel Continuous Outlier Mining in Streaming Data 流数据中的并行连续离群值挖掘

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00033

Theodoros Toliopoulos, A. Gounaris, K. Tsichlas, A. Papadopoulos, Sandra Sampaio

{"title":"Parallel Continuous Outlier Mining in Streaming Data","authors":"Theodoros Toliopoulos, A. Gounaris, K. Tsichlas, A. Papadopoulos, Sandra Sampaio","doi":"10.1109/DSAA.2018.00033","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00033","url":null,"abstract":"In this work, we focus on distance-based outliers in a metric space, where the status of an entity as to whether it is an outlier is based on the number of other entities in its neighborhood. In the recent years, several solutions have tackled the problem of distance-based outliers in data streams, where outliers must be mined continuously as new elements become available. An interesting research problem is to combine the streaming environment with massively parallel systems to provide scalable stream-based algorithms. However, none of the previously proposed techniques refer to a massively parallel setting. Our proposal fills this gap and studies transferring state-of-the-art techniques in Apache Flink, a modern platform for intensive streaming analytics. We thoroughly present the technical challenges encountered and the alternatives that may be applied. We show speed-ups up to 117 (resp. 2076) times over a naive parallel (resp. non-parallel) solution in Flink, by using just an ordinary 4-core machine and a real-world dataset. Our results demonstrate that oulier mining can be achieved in an efficient and scalable manner. The resulting techniques have been made publicly available in open-source","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116406530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

SeCredISData 2018: Special Session on Sentiment, Emotion, and Credibility of Information in Social Data SeCredISData 2018:社交数据中信息的情感、情感和可信度专题会议

2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2018-10-01 DOI: 10.1109/DSAA.2018.00082

F. Benamara, C. Bosco, E. Fersini, G. Pasi, V. Patti, Marco Viviani

{"title":"SeCredISData 2018: Special Session on Sentiment, Emotion, and Credibility of Information in Social Data","authors":"F. Benamara, C. Bosco, E. Fersini, G. Pasi, V. Patti, Marco Viviani","doi":"10.1109/DSAA.2018.00082","DOIUrl":"https://doi.org/10.1109/DSAA.2018.00082","url":null,"abstract":"The Social Web represents nowadays the principal means to support and foster social interactions among people through Web 2.0 technologies. Individuals interact in virtual communities to pursue mutual interests or goals, by exchanging multiple kinds of contents (i.e., textual, acoustic, visual), the so-called User-Generated Content (UGC). In this context, the SeCredISData Special Session is especially devoted at discussing the implications that the analysis of big social data has in tackling open issues related to society from different perspectives. On one side, there is the need to push forward the research on emotion and sentiment, and the investigation of affective cognitive models and their possible integration into intelligent systems. On the other side, it is urgent to address the issue of on-line information credibility assessment, in an era where trusted intermediaries have disappeared and people must rely only on their cognitive capacities to judge information. The Special Session is therefore aimed at promoting the development of models and applications able to tackle these issues.","PeriodicalId":208455,"journal":{"name":"2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126766370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0