{"title":"The sum is greater than the parts: ensembling models of student knowledge in educational software","authors":"Z. Pardos, S. M. Gowda, R. Baker, N. Heffernan","doi":"10.1145/2207243.2207249","DOIUrl":"https://doi.org/10.1145/2207243.2207249","url":null,"abstract":"Many competing models have been proposed in the past decade for predicting student knowledge within educational software. Recent research attempted to combine these models in an effort to improve performance but have yielded inconsistent results. While work in the 2010 KDD Cup data set showed the benefits of ensemble methods, work in the Genetics Tutor failed to show similar benefits. We hypothesize that the key factor has been data set size. We explore the potential for improving student performance prediction with ensemble methods in a data set drawn from a different tutoring system, the ASSISTments Platform, which contains 15 times the number of responses of the Genetics Tutor data set. We evaluated the predictive performance of eight student models and eight methods of ensembling predictions. Within this data set, ensemble approaches were more effective than any single method with the best ensemble approach producing predictions of student performance 10% better than the best individual student knowledge model.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"37-44"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81475322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A study on the importance of and time spent on different modeling steps","authors":"M. A. Munson","doi":"10.1145/2207243.2207253","DOIUrl":"https://doi.org/10.1145/2207243.2207253","url":null,"abstract":"Applying data mining and machine learning algorithms requires many steps to prepare data and to make use of modeling results. This study investigates two questions: (1) how time consuming are the pre- and post-processing steps? (2) how much research energy is spent on these steps? To answer these questions I surveyed practitioners about their experiences in applying modeling techniques and categorized data mining and machine learning research papers from 2009 according to the modeling step(s) they addressed. Survey results show that model building consumes only 14% of the time spent on a typical project; the remaining time is spent on pre- and post-processing steps. Both survey responses and the categorization of research papers show that data mining and machine learning researchers spend the majority of their energy on algorithms for constructing models and significantly less energy on other steps. These findings collectively suggest that there are research opportunities to simplify the steps that precede and follow model building.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"65-71"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74874242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A conversation with Dr. Yong Shi","authors":"Yong Shi","doi":"10.1145/2207243.2207262","DOIUrl":"https://doi.org/10.1145/2207243.2207262","url":null,"abstract":"1. Please share with us your view on the history and important milestones of the Chinese KDD research and application areas. Chinese KDD or Data Mining research started from the early 1990's when the first group of international KDD communities was formed. Since the demand of using information technology (IT), including computing tools, and Internet communications, was growing in Chinese economic reform, many scholars in universities and institutes have paid attention on KDD related research. A number of books in the field, such as Artificial Intelligence by Ruqian Lu (1996 in Chinese) and Knowledge Discovery by Zhongzhi Shi (2011 in Chinese), have recorded the progress of KDD in China. These works have also influenced young generations who were working on KDD at home and aboard. In recent years, Chinese governmental branches and industries have built their own databases. Some of them, especially China's commercial banks and mobile communication industries, have been migrating the databases into data warehouse and applying techniques of KDD to solve their business decision making problems. Chinese ICT (information and communications technology) market becomes the largest one in the world. For example, there are 457 Chinese million using Internet service in 2011. It was forecasted that by 2015, the Internet population of Chinese will reach 1.2 billion. As a significant number of researchers from both research institutes and universities are showing their increasing interest in doing various KDD problems, the National Natural Science Foundation of China (NSFC) has sponsored a large number of KDD proposals since 1990's. According to China's National Science and Technology Development Mid-Long Term Planning (2006-2020), \" theories and methods of large-scale information processing and knowledge mining \" have been identified as one of the key supporting technologies in fundamental scientific research for the national prioritized strategic needs. This has demonstrated that Chinese government's strong commitment on KDD related research and applications 2. Please describe your expertise and contribution to KDD. With the multidisciplinary nature of KDD, financial markets, environmental sciences and public management, CASFEDS has three major functions: fundamental and theoretical development, application-oriented research, and thank tank of Chinese government. For the last seven years, CASFEDS has been granted more than 30 million RMB by NSFC, the Ministry of Chinese Science and Technology, CAS and National Audit Office of China for its KDD related research projects. It has published more than 300 research papers in the international journals and conferences, including a number …","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"97 1","pages":"87-88"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80617374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An introduction to SIGKDD and a reflection on the term 'data mining'","authors":"G. Piatetsky-Shapiro, U. Fayyad","doi":"10.1145/2207243.2207269","DOIUrl":"https://doi.org/10.1145/2207243.2207269","url":null,"abstract":"The primary focus of SIGKDD is to provide the premier forum for advancement and adoption of the \"science\" of knowledge discovery and data mining. SIGKDD main activity is to organize KDD, the leading conference on data mining and knowledge discovery , held since 1995. KDD conference is top-ranked in Data Mining, according to Microsoft Research Asia. KDD-2011 was held in San Diego, CA, USA was the largest data-mining meeting in the world, with over 1,100 participants from around the world.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"57 1","pages":"102-103"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81910096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Introduction to the special section on educational data mining","authors":"T. Calders, Mykola Pechenizkiy","doi":"10.1145/2207243.2207245","DOIUrl":"https://doi.org/10.1145/2207243.2207245","url":null,"abstract":"Educational Data Mining (EDM) is an emerging multidisciplinary research area, in which methods and techniques for exploring data originating from various educational information systems have been developed. EDM is both a learning science, as well as a rich application area for data mining, due to the growing availability of educational data. EDM contributes to the study of how students learn, and the settings in which they learn. It enables data-driven decision making for improving the current educational practice and learning material. We present a brief overview of EDM and introduce four selected EDM papers representing a crosscut of different application areas for data mining in education.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"14 1","pages":"3-6"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88468844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process mining: making knowledge discovery process centric","authors":"Wil M.P. van der Aalst","doi":"10.1145/2207243.2207251","DOIUrl":"https://doi.org/10.1145/2207243.2207251","url":null,"abstract":"Recently, the Task Force on Process Mining released the Process Mining Manifesto. The manifesto is supported by 53 organizations and 77 process mining experts contributed to it. The active contributions from end-users, tool vendors, consultants, analysts, and researchers illustrate the growing relevance of process mining as a bridge between data mining and business process modeling. This paper summarizes the manifesto and explains why process mining is a highly relevant, but also very challenging, research area. This way we hope to stimulate the broader ACM SIGKDD community to look at process-centric knowledge discovery.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"66 1","pages":"45-49"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88973357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survey on web spam detection: principles and algorithms","authors":"N. Spirin, Jiawei Han","doi":"10.1145/2207243.2207252","DOIUrl":"https://doi.org/10.1145/2207243.2207252","url":null,"abstract":"Search engines became a de facto place to start information acquisition on the Web. Though due to web spam phenomenon, search results are not always as good as desired. Moreover, spam evolves that makes the problem of providing high quality search even more challenging. Over the last decade research on adversarial information retrieval has gained a lot of interest both from academia and industry. In this paper we present a systematic review of web spam detection techniques with the focus on algorithms and underlying principles. We categorize all existing algorithms into three categories based on the type of information they use: content-based methods, link-based methods, and methods based on non-traditional data such as user behaviour, clicks, HTTP sessions. In turn, we perform a subcategorization of link-based category into five groups based on ideas and principles used: labels propagation, link pruning and reweighting, labels refinement, graph regularization, and featurebased. We also define the concept of web spam numerically and provide a brief survey on various spam forms. Finally, we summarize the observations and underlying principles applied for web spam detection.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"48 1","pages":"50-64"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75248021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei-Ying Ma, Tie-Yan Liu, Ji-Rong Wen, Zheng Chen, Zaiqing Nie, Xing Xie, Hang Li, Haixun Wang, Yu Zheng
{"title":"A conversation with MSRA researchers","authors":"Wei-Ying Ma, Tie-Yan Liu, Ji-Rong Wen, Zheng Chen, Zaiqing Nie, Xing Xie, Hang Li, Haixun Wang, Yu Zheng","doi":"10.1145/2207243.2207260","DOIUrl":"https://doi.org/10.1145/2207243.2207260","url":null,"abstract":"Ten years ago, KDD research was still in its infancy in China. Things have changed significantly. With the push from the technological advancement in academia and the pull from the explosive growth of application needs in industry, KDD research is flourishing. At Microsoft Research Asia, we have been conducting research in many areas related to KDD research, including web search, data mining, information retrieval, multimedia mining, natural language processing, and visualization. In addition to publishing papers in KDD and developing technologies for commercial products, we have also contributed to the talent development in China by supervising students and growing young researchers who later become well known in the field related to KDD in universities and industries.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"43 5","pages":"82-84"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72563966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A conversation with Professor Zhi-Hua Zhou","authors":"Zhi-Hua Zhou","doi":"10.1145/2207243.2207268","DOIUrl":"https://doi.org/10.1145/2207243.2207268","url":null,"abstract":"I came into the KDD area in late 1990s. From my point of view, an important event of Chinese KDD development was the 3 PAKDD conference, which was held in Beijing in April 1999. That was the first international conference on KDD held in China, and it helped to form the Chinese KDD community. Later, in 2007 the 11 PAKDD conference was held in Nanjing, for which I was the program chair. The PAKDD 2007 conference attracted more than 730 submissions and more than 270 attendees from China as well as other Asia-Pacific countries/regions; I think this is a good sign of the growth of the Chinese KDD community. Another important event is the CCDM (China Conference on Data Mining) conference held in Yantai in August 2009. This is a biennial conference, sponsored by the Artificial Intelligence and Pattern Recognition Technical Committee of the China Computer Federation (CCF), and the Machine Learning Technical Committee of the China Association of Artificial Intelligence (CAAI); fortunately I served as the general co-chair. The origin of the conference was two editions of China Conference on Classification Technology and Application (CCTA), held in 2005 and 2007, in Beijing and Zhengzhou, respectively. With the growth of the Chinese data mining community, and the lack of a domestic data mining conference, the CCTA conference changed to CCDM from 2009, while in 2011 the CCDM conference was held in Guangzhou, attracting about 150 attendees. I think the IEEE ICDM 2006 conference held in Hong Kong is also a milestone, which greatly promoted the communication of China and international KDD community. I believe KDD 2012 will definitely become a milestone.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"33 1","pages":"101"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78604232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A conversation with the Chinese KDD leaders","authors":"Qiang Yang","doi":"10.1145/2207243.2207255","DOIUrl":"https://doi.org/10.1145/2207243.2207255","url":null,"abstract":"In August 2012, the 18 Annual ACM SIGKDD Conference, KDD 2012, will be held in Beijing, China. This is the first time for this flagship knowledge discovery and data mining conference to be held in Asia, and the second time for it to be held outside North America. As before, the KDD 2012 conference will be a central place where researchers, practitioners and students from academia, business, government and industry converge to exchange the newest and most exciting ideas and results in the KDD area. Unlike previous KDD conferences, however, many new faces will be seen, new voices heard, and new perspectives discussed. This is particularly true because this KDD will be held in Beijing, the heartbeat of China’s universities, research institutes, industrial and governmental offices; the epicentre of the rapidly rising and opening China.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"10 1","pages":"72"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79939335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}