SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献

筛选
英文 中文
The sum is greater than the parts: ensembling models of student knowledge in educational software 教育软件中学生知识的集成模型的总和大于部分
Z. Pardos, S. M. Gowda, R. Baker, N. Heffernan
{"title":"The sum is greater than the parts: ensembling models of student knowledge in educational software","authors":"Z. Pardos, S. M. Gowda, R. Baker, N. Heffernan","doi":"10.1145/2207243.2207249","DOIUrl":"https://doi.org/10.1145/2207243.2207249","url":null,"abstract":"Many competing models have been proposed in the past decade for predicting student knowledge within educational software. Recent research attempted to combine these models in an effort to improve performance but have yielded inconsistent results. While work in the 2010 KDD Cup data set showed the benefits of ensemble methods, work in the Genetics Tutor failed to show similar benefits. We hypothesize that the key factor has been data set size. We explore the potential for improving student performance prediction with ensemble methods in a data set drawn from a different tutoring system, the ASSISTments Platform, which contains 15 times the number of responses of the Genetics Tutor data set. We evaluated the predictive performance of eight student models and eight methods of ensembling predictions. Within this data set, ensemble approaches were more effective than any single method with the best ensemble approach producing predictions of student performance 10% better than the best individual student knowledge model.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81475322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
A study on the importance of and time spent on different modeling steps 对不同建模步骤的重要性和所花费时间的研究
M. A. Munson
{"title":"A study on the importance of and time spent on different modeling steps","authors":"M. A. Munson","doi":"10.1145/2207243.2207253","DOIUrl":"https://doi.org/10.1145/2207243.2207253","url":null,"abstract":"Applying data mining and machine learning algorithms requires many steps to prepare data and to make use of modeling results. This study investigates two questions: (1) how time consuming are the pre- and post-processing steps? (2) how much research energy is spent on these steps? To answer these questions I surveyed practitioners about their experiences in applying modeling techniques and categorized data mining and machine learning research papers from 2009 according to the modeling step(s) they addressed. Survey results show that model building consumes only 14% of the time spent on a typical project; the remaining time is spent on pre- and post-processing steps. Both survey responses and the categorization of research papers show that data mining and machine learning researchers spend the majority of their energy on algorithms for constructing models and significantly less energy on other steps. These findings collectively suggest that there are research opportunities to simplify the steps that precede and follow model building.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"65-71"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74874242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 48
A conversation with Dr. Yong Shi 与史勇博士的对话
Yong Shi
{"title":"A conversation with Dr. Yong Shi","authors":"Yong Shi","doi":"10.1145/2207243.2207262","DOIUrl":"https://doi.org/10.1145/2207243.2207262","url":null,"abstract":"1. Please share with us your view on the history and important milestones of the Chinese KDD research and application areas. Chinese KDD or Data Mining research started from the early 1990's when the first group of international KDD communities was formed. Since the demand of using information technology (IT), including computing tools, and Internet communications, was growing in Chinese economic reform, many scholars in universities and institutes have paid attention on KDD related research. A number of books in the field, such as Artificial Intelligence by Ruqian Lu (1996 in Chinese) and Knowledge Discovery by Zhongzhi Shi (2011 in Chinese), have recorded the progress of KDD in China. These works have also influenced young generations who were working on KDD at home and aboard. In recent years, Chinese governmental branches and industries have built their own databases. Some of them, especially China's commercial banks and mobile communication industries, have been migrating the databases into data warehouse and applying techniques of KDD to solve their business decision making problems. Chinese ICT (information and communications technology) market becomes the largest one in the world. For example, there are 457 Chinese million using Internet service in 2011. It was forecasted that by 2015, the Internet population of Chinese will reach 1.2 billion. As a significant number of researchers from both research institutes and universities are showing their increasing interest in doing various KDD problems, the National Natural Science Foundation of China (NSFC) has sponsored a large number of KDD proposals since 1990's. According to China's National Science and Technology Development Mid-Long Term Planning (2006-2020), \" theories and methods of large-scale information processing and knowledge mining \" have been identified as one of the key supporting technologies in fundamental scientific research for the national prioritized strategic needs. This has demonstrated that Chinese government's strong commitment on KDD related research and applications 2. Please describe your expertise and contribution to KDD. With the multidisciplinary nature of KDD, financial markets, environmental sciences and public management, CASFEDS has three major functions: fundamental and theoretical development, application-oriented research, and thank tank of Chinese government. For the last seven years, CASFEDS has been granted more than 30 million RMB by NSFC, the Ministry of Chinese Science and Technology, CAS and National Audit Office of China for its KDD related research projects. It has published more than 300 research papers in the international journals and conferences, including a number …","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"97 1","pages":"87-88"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80617374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An introduction to SIGKDD and a reflection on the term 'data mining' 介绍SIGKDD和对术语“数据挖掘”的思考
G. Piatetsky-Shapiro, U. Fayyad
{"title":"An introduction to SIGKDD and a reflection on the term 'data mining'","authors":"G. Piatetsky-Shapiro, U. Fayyad","doi":"10.1145/2207243.2207269","DOIUrl":"https://doi.org/10.1145/2207243.2207269","url":null,"abstract":"The primary focus of SIGKDD is to provide the premier forum for advancement and adoption of the \"science\" of knowledge discovery and data mining. SIGKDD main activity is to organize KDD, the leading conference on data mining and knowledge discovery , held since 1995. KDD conference is top-ranked in Data Mining, according to Microsoft Research Asia. KDD-2011 was held in San Diego, CA, USA was the largest data-mining meeting in the world, with over 1,100 participants from around the world.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"57 1","pages":"102-103"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81910096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Introduction to the special section on educational data mining 介绍教育数据挖掘的特殊部分
T. Calders, Mykola Pechenizkiy
{"title":"Introduction to the special section on educational data mining","authors":"T. Calders, Mykola Pechenizkiy","doi":"10.1145/2207243.2207245","DOIUrl":"https://doi.org/10.1145/2207243.2207245","url":null,"abstract":"Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM is both a learning science, as well as a rich application area for data mining, due to the growing availability of educational data. EDM contributes to the study of how students learn, and the settings in which they learn. It enables data-driven decision making for improving the current educational practice and learning material. We present a brief overview of EDM and introduce four selected EDM papers representing a crosscut of different application areas for data mining in education.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"14 1","pages":"3-6"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88468844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Process mining: making knowledge discovery process centric 过程挖掘:以知识发现过程为中心
Wil M.P. van der Aalst
{"title":"Process mining: making knowledge discovery process centric","authors":"Wil M.P. van der Aalst","doi":"10.1145/2207243.2207251","DOIUrl":"https://doi.org/10.1145/2207243.2207251","url":null,"abstract":"Recently, the Task Force on Process Mining released the Process Mining Manifesto. The manifesto is supported by 53 organizations and 77 process mining experts contributed to it. The active contributions from end-users, tool vendors, consultants, analysts, and researchers illustrate the growing relevance of process mining as a bridge between data mining and business process modeling. This paper summarizes the manifesto and explains why process mining is a highly relevant, but also very challenging, research area. This way we hope to stimulate the broader ACM SIGKDD community to look at process-centric knowledge discovery.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"66 1","pages":"45-49"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88973357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2500
Survey on web spam detection: principles and algorithms 网络垃圾邮件检测综述:原理和算法
N. Spirin, Jiawei Han
{"title":"Survey on web spam detection: principles and algorithms","authors":"N. Spirin, Jiawei Han","doi":"10.1145/2207243.2207252","DOIUrl":"https://doi.org/10.1145/2207243.2207252","url":null,"abstract":"Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phenomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam detection techniques with the focus on algorithms and underlying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization, and featurebased. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"48 1","pages":"50-64"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75248021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 267
A conversation with MSRA researchers 与MSRA研究人员的对话
Wei-Ying Ma, Tie-Yan Liu, Ji-Rong Wen, Zheng Chen, Zaiqing Nie, Xing Xie, Hang Li, Haixun Wang, Yu Zheng
{"title":"A conversation with MSRA researchers","authors":"Wei-Ying Ma, Tie-Yan Liu, Ji-Rong Wen, Zheng Chen, Zaiqing Nie, Xing Xie, Hang Li, Haixun Wang, Yu Zheng","doi":"10.1145/2207243.2207260","DOIUrl":"https://doi.org/10.1145/2207243.2207260","url":null,"abstract":"Ten years ago, KDD research was still in its infancy in China. Things have changed significantly. With the push from the technological advancement in academia and the pull from the explosive growth of application needs in industry, KDD research is flourishing. At Microsoft Research Asia, we have been conducting research in many areas related to KDD research, including web search, data mining, information retrieval, multimedia mining, natural language processing, and visualization. In addition to publishing papers in KDD and developing technologies for commercial products, we have also contributed to the talent development in China by supervising students and growing young researchers who later become well known in the field related to KDD in universities and industries.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"43 5","pages":"82-84"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72563966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conversation with Professor Zhi-Hua Zhou 与周志华教授的对话
Zhi-Hua Zhou
{"title":"A conversation with Professor Zhi-Hua Zhou","authors":"Zhi-Hua Zhou","doi":"10.1145/2207243.2207268","DOIUrl":"https://doi.org/10.1145/2207243.2207268","url":null,"abstract":"I came into the KDD area in late 1990s. From my point of view, an important event of Chinese KDD development was the 3 PAKDD conference, which was held in Beijing in April 1999. That was the first international conference on KDD held in China, and it helped to form the Chinese KDD community. Later, in 2007 the 11 PAKDD conference was held in Nanjing, for which I was the program chair. The PAKDD 2007 conference attracted more than 730 submissions and more than 270 attendees from China as well as other Asia-Pacific countries/regions; I think this is a good sign of the growth of the Chinese KDD community. Another important event is the CCDM (China Conference on Data Mining) conference held in Yantai in August 2009. This is a biennial conference, sponsored by the Artificial Intelligence and Pattern Recognition Technical Committee of the China Computer Federation (CCF), and the Machine Learning Technical Committee of the China Association of Artificial Intelligence (CAAI); fortunately I served as the general co-chair. The origin of the conference was two editions of China Conference on Classification Technology and Application (CCTA), held in 2005 and 2007, in Beijing and Zhengzhou, respectively. With the growth of the Chinese data mining community, and the lack of a domestic data mining conference, the CCTA conference changed to CCDM from 2009, while in 2011 the CCDM conference was held in Guangzhou, attracting about 150 attendees. I think the IEEE ICDM 2006 conference held in Hong Kong is also a milestone, which greatly promoted the communication of China and international KDD community. I believe KDD 2012 will definitely become a milestone.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"33 1","pages":"101"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78604232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conversation with the Chinese KDD leaders 与中国共产党领导人的谈话
Qiang Yang
{"title":"A conversation with the Chinese KDD leaders","authors":"Qiang Yang","doi":"10.1145/2207243.2207255","DOIUrl":"https://doi.org/10.1145/2207243.2207255","url":null,"abstract":"In August 2012, the 18 Annual ACM SIGKDD Conference, KDD 2012, will be held in Beijing, China. This is the first time for this flagship knowledge discovery and data mining conference to be held in Asia, and the second time for it to be held outside North America. As before, the KDD 2012 conference will be a central place where researchers, practitioners and students from academia, business, government and industry converge to exchange the newest and most exciting ideas and results in the KDD area. Unlike previous KDD conferences, however, many new faces will be seen, new voices heard, and new perspectives discussed. This is particularly true because this KDD will be held in Beijing, the heartbeat of China’s universities, research institutes, industrial and governmental offices; the epicentre of the rapidly rising and opening China.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"10 1","pages":"72"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79939335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信