SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献

筛选
英文 中文
Current and Future Challenges in Mining Large Networks: Report on the Second SDM Workshop on Mining Networks and Graphs 挖掘大型网络的当前和未来挑战:第二届SDM挖矿网络和图研讨会报告
L. Holder, R. Caceres, D. Gleich, E. J. Riedy, Maleq Khan, N. Chawla, Ravi Kumar, Yinghui Wu, Christine Klymko, Tina Eliassi-Rad, B. Prakash
{"title":"Current and Future Challenges in Mining Large Networks: Report on the Second SDM Workshop on Mining Networks and Graphs","authors":"L. Holder, R. Caceres, D. Gleich, E. J. Riedy, Maleq Khan, N. Chawla, Ravi Kumar, Yinghui Wu, Christine Klymko, Tina Eliassi-Rad, B. Prakash","doi":"10.1145/2980765.2980770","DOIUrl":"https://doi.org/10.1145/2980765.2980770","url":null,"abstract":"We report on the Second Workshop on Mining Networks and Graphs held at the 2015 SIAM International Conference on Data Mining. This half-day workshop consisted of a keynote talk, four technical paper presentations, one demonstration, and a panel on future challenges in mining large networks. We summarize the main highlights of the workshop, including expanded written summaries of the future challenges provided by the panelists. The current and future challenges discussed at the workshop and elaborated here provide valuable guidance for future research in the field","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"23 1","pages":"39-45"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81320971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The Internet of Things: Opportunities and Challenges for Distributed Data Analysis 物联网:分布式数据分析的机遇与挑战
Marco Stolpe
{"title":"The Internet of Things: Opportunities and Challenges for Distributed Data Analysis","authors":"Marco Stolpe","doi":"10.1145/2980765.2980768","DOIUrl":"https://doi.org/10.1145/2980765.2980768","url":null,"abstract":"Nowadays, data is created by humans as well as automatically collected by physical things, which embed electronics, software, sensors and network connectivity. Together, these entities constitute the Internet of Things (IoT). The automated analysis of its data can provide insights into previously unknown relationships between things, their environment and their users, facilitating an optimization of their behavior. Especially the real-time analysis of data, embedded into physical systems, can enable new forms of autonomous control. These in turn may lead to more sustainable applications, reducing waste and saving resources IoT's distributed and dynamic nature, resource constraints of sensors and embedded devices as well as the amounts of generated data are challenging even the most advanced automated data analysis methods known today. In particular, the IoT requires a new generation of distributed analysis methods. Many existing surveys have strongly focused on the centralization of data in the cloud and big data analysis, which follows the paradigm of parallel high-performance computing. However, bandwidth and energy can be too limited for the transmission of raw data, or it is prohibited due to privacy constraints. Such communication-constrained scenarios require decentralized analysis algorithms which at least partly work directly on the generating devices. After listing data-driven IoT applications, in contrast to existing surveys, we highlight the differences between cloudbased and decentralized analysis from an algorithmic perspective. We present the opportunities and challenges of research on communication-efficient decentralized analysis algorithms. Here, the focus is on the difficult scenario of vertically partitioned data, which covers common IoT use cases. The comprehensive bibliography aims at providing readers with a good starting point for their own work","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"12 1","pages":"15-34"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72719008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 94
MultiClust 2013: Multiple Clusterings, Multiview Data, and Multisource Knowledgedriven Clustering: [Workshop Report] MultiClust 2013:多聚类、多视图数据和多源知识驱动聚类:[研讨会报告]
I. Assent, C. Domeniconi, Francesco Gullo, Andrea Tagarelli, A. Zimek
{"title":"MultiClust 2013: Multiple Clusterings, Multiview Data, and Multisource Knowledgedriven Clustering: [Workshop Report]","authors":"I. Assent, C. Domeniconi, Francesco Gullo, Andrea Tagarelli, A. Zimek","doi":"10.1145/2980765.2980769","DOIUrl":"https://doi.org/10.1145/2980765.2980769","url":null,"abstract":"In this workshop report, we give a summary of the Multi-Clust workshop held in Chicago in conjunction with KDD 2013. We provide an overview on the history of this workshop series and the general topics covered. Furthermore, we provide summaries of the invited talks and of the contributed papers.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"128 1","pages":"35-38"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88696836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interactive Data Repository with Visual Analytics 具有可视化分析的交互式数据存储库
Ryan A. Rossi, Nesreen Ahmed
{"title":"An Interactive Data Repository with Visual Analytics","authors":"Ryan A. Rossi, Nesreen Ahmed","doi":"10.1145/2897350.2897355","DOIUrl":"https://doi.org/10.1145/2897350.2897355","url":null,"abstract":"Scientific data repositories have historically made data widely accessible to the scientific community, and have led to better research through comparisons, reproducibility, as well as further discoveries and insights. Despite the growing importance and utilization of data repositories in many scientific disciplines, the design of existing data repositories has not changed for decades. In this paper, we revisit the current design and envision interactive data repositories, which not only make data accessible, but also provide techniques for interactive data exploration, mining, and visualization in an easy, intuitive, and free-flowing manner.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"57 1","pages":"37-41"},"PeriodicalIF":0.0,"publicationDate":"2016-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84578060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 64
Web Content Extraction Web内容提取
WeningerTim, PalaciosRodrigo, CrescenziValter, GottronThomas, MerialdoPaolo
{"title":"Web Content Extraction","authors":"WeningerTim, PalaciosRodrigo, CrescenziValter, GottronThomas, MerialdoPaolo","doi":"10.1007/springerreference_66087","DOIUrl":"https://doi.org/10.1007/springerreference_66087","url":null,"abstract":"","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"52982540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shedding Light on the Performance of Solar Panels: A Data-Driven View 揭示太阳能电池板的性能:一个数据驱动的观点
S. A. Chen, A. Vishwanath, Saket K. Sathe, S. Kalyanaraman
{"title":"Shedding Light on the Performance of Solar Panels: A Data-Driven View","authors":"S. A. Chen, A. Vishwanath, Saket K. Sathe, S. Kalyanaraman","doi":"10.1145/2897350.2897354","DOIUrl":"https://doi.org/10.1145/2897350.2897354","url":null,"abstract":"The significant adoption of solar photovoltaic (PV) systems in both commercial and residential sectors has spurred an interest in monitoring the performance of these systems. This is facilitated by the increasing availability of regularly logged PV performance data in recent years. In this paper, we present a data-driven framework to systematically characterise the relationship between performance of an existing photovoltaic (PV) system and various environmental factors. We demonstrate the efficacy of our proposed framework by applying it to a PV generation dataset from a building located in northern Australia. We show how, in light of limited site-specific weather information, this data set may be coupled with publicly available data to yield rich insights on the performance of the building's PV system.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"21 1","pages":"24-36"},"PeriodicalIF":0.0,"publicationDate":"2016-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87317460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Question Quality in Community Question Answering Forums: a survey 社区问答论坛问题质量调查
Antoaneta Baltadzhieva, Grzegorz Chrupała
{"title":"Question Quality in Community Question Answering Forums: a survey","authors":"Antoaneta Baltadzhieva, Grzegorz Chrupała","doi":"10.1145/2830544.2830547","DOIUrl":"https://doi.org/10.1145/2830544.2830547","url":null,"abstract":"Community Question Answering websites (CQA) offer a new opportunity for users to provide, search and share knowledge. Although the idea of receiving a direct, targeted response to a question sounds very attractive, the quality of the question itself can have an important effect on the likelihood of getting useful answers. High quality questions improve the CQA experience and therefore it is essential for CQA forums to better understand what characterizes questions that are more appealing for the forum community. In this survey, we review existing research on question quality in CQA websites. We discuss the possible measures of question quality and the question features that have been shown to influence question quality.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"64 1","pages":"8-13"},"PeriodicalIF":0.0,"publicationDate":"2015-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74031725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Theoretical Foundations and Algorithms for Outlier Ensembles 离群值集成的理论基础和算法
C. Aggarwal, Saket K. Sathe
{"title":"Theoretical Foundations and Algorithms for Outlier Ensembles","authors":"C. Aggarwal, Saket K. Sathe","doi":"10.1145/2830544.2830549","DOIUrl":"https://doi.org/10.1145/2830544.2830549","url":null,"abstract":"Ensemble analysis has recently been studied in the context of the outlier detection problem. In this paper, we investigate the theoretical underpinnings of outlier ensemble analysis. In spite of the significant differences between the classification and the outlier analysis problems, we show that the theoretical underpinnings between the two problems are actually quite similar in terms of the bias-variance trade-off. We explain the existing algorithms within this traditional framework, and clarify misconceptions about the reasoning underpinning these methods. We propose more effective variants of subsampling and feature bagging. We also discuss the impact of the combination function and discuss the specific trade-offs of the average and maximization functions. We use these insights to propose new combination functions that are robust in many settings.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"48 1","pages":"24-47"},"PeriodicalIF":0.0,"publicationDate":"2015-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90769482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 212
A Framework for Collocation Error Correction in Web Pages and Text Documents 网页与文本文档搭配纠错框架
Alan Varghese, A. Varde, Jing Peng, Eileen Fitzpatrick
{"title":"A Framework for Collocation Error Correction in Web Pages and Text Documents","authors":"Alan Varghese, A. Varde, Jing Peng, Eileen Fitzpatrick","doi":"10.1145/2830544.2830548","DOIUrl":"https://doi.org/10.1145/2830544.2830548","url":null,"abstract":"Much of the English in text documents today comes from nonnative speakers. Web searches are also conducted very often by non-native speakers. Though highly qualified in their respective fields, these speakers could potentially make errors in collocation, e.g., \"dark money\" and \"stock agora\" (instead of the more appropriate English expressions \"black money\" and \"stock market\" respectively). These may arise due to literal translation from the respective speaker's native language or other factors. Such errors could cause problems in contexts such as querying over Web pages, correct understanding of text documents and more. This paper proposes a framework called CollOrder to detect such collocation errors and suggest correctly ordered collocated responses for improving the semantics. This framework integrates machine learning approaches with natural language processing techniques, proposing suitable heuristics to provide responses to collocation errors, ranked in the order of correctness. We discuss the proposed framework with algorithms and experimental evaluation in this paper. We claim that it would be useful in semantically enhancing Web querying e.g., financial news, online shopping etc. It would also help in providing automated error correction in machine translated documents and offering assistance to people using ESL tools.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"67 1","pages":"14-23"},"PeriodicalIF":0.0,"publicationDate":"2015-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79844414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Load-Balancing the Distance Computations in Record Linkage 负载均衡记录联动中的距离计算
Dimitrios Karapiperis, Vassilios S. Verykios
{"title":"Load-Balancing the Distance Computations in Record Linkage","authors":"Dimitrios Karapiperis, Vassilios S. Verykios","doi":"10.1145/2830544.2830546","DOIUrl":"https://doi.org/10.1145/2830544.2830546","url":null,"abstract":"In this paper, we propose a novel method for distributing the distance computations of record pairs generated by a blocking mechanism to the reduce tasks of a Map/Reduce system. The proposed solutions in the literature analyze the blocks and then construct a profile, which contains the number of record pairs in each block. However, this deterministic process, including all its variants, might incur considerable overhead given massive data sets. In contrast, our method utilizes two Map/Reduce jobs where the first job formulates the record pairs while the second job distributes these pairs to the reduce tasks, which perform the distance computations, using repetitive allocation rounds. In each such round, we utilize all the available reduce tasks on a random basis by generating permutations of their indexes. A series of experiments demonstrate an almost-equal distribution of the record pairs, or equivalently of the distance computations, to the reduce tasks, which makes our method a simple, yet efficient, solution for applying a blocking mechanism given massive data sets.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"28 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2015-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74371530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信