SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining最新文献

筛选
英文 中文
A conversation with Professor Bole Shi 与史伯乐教授的对话
Baile Shi
{"title":"A conversation with Professor Bole Shi","authors":"Baile Shi","doi":"10.1145/2207243.2207261","DOIUrl":"https://doi.org/10.1145/2207243.2207261","url":null,"abstract":"Looking back in the past, we note that in the past 20 years, data mining has been a popular topic in academic research, but in industrial applications, we have heard few exciting examples after the \"beer and diapers\" one. This is in part due to the general pattern of technological advances, in which the advance of academic research is ahead of industrial applications. On the other hand, we cannot ignore the fact that in this past 20 years, in the world, especially in China, data processing and analytics have focused more on data accumulation and integration phase, but the need for wide-spread data mining has yet to come. Currently, after so many years of accumulation and preparation, whether in China or in the world, the scale and complexity of available data has far exceeded our expectations, and further development of data mining will finally take the central stage of research and development. In this view, we expect to see more advance in data mining in terms of its ability to handle greater data types, larger scales, diverse business needs, application varieties and crossdisciplinary integration.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"46 1","pages":"85-86"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79782851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conversation with Professors Deyi Li and Jie Tang 与李德毅、唐杰教授对话
Deyi Li, Jie Tang
{"title":"A conversation with Professors Deyi Li and Jie Tang","authors":"Deyi Li, Jie Tang","doi":"10.1145/2207243.2207257","DOIUrl":"https://doi.org/10.1145/2207243.2207257","url":null,"abstract":"Roughly speaking, Chinese KDD research mainly underwent three stages. It was in 1993 when National Science Foundation of China (NSFC) started to sponsor research on knowledge discovery and data mining. This can be considered as the first stage. The major research around that time was focused on “Knowledge Discovery from Database”, including sub-topics such as frequent mining and association rule mining from databases. The research was mainly conducted in academic institutes. The second stage started from the end of 1990’s, with the emergence and the rapid proliferation of Web-based applications. People started to notice that the largest data source for mining is the information on the Web instead of traditional databases. At the same time the mining tasks became more diversified. In the second stage, the term “Web Mining” became popular in the field. Research labs on “knowledge engineering”, “web/internet mining” have been built in different research institutes and rapidly developed. Several web search companies also emerged in this stage such as Baidu and Sogou. The third stage began around 2005, when online social applications and media (such as, in China, Tencent, Sina Weibo, Renren) become a prevalent and complex force to influence our daily life. Indeed, Tencent, the largest social network in China, already has more than 700 million registered users, the same number of Facebook; Sina Weibo has attracted 250 million users in the past two years, a figure higher than Twitter. These online networks grow very fast and they provide a huge amount of user generated content, which presents great opportunities in understanding the science of these networks. Accordingly, the emphasis of the research started to switch to mining social networks. This is a more diverse research field, attracting researchers from a wide range of academic fields, including theory and algorithms, data mining and machine learning, computer systems and networks, statistical physics and complex systems, social psychology, economics and managerial science. Another important change in this stage is that Chinese companies are paying more and more attention to data mining research. Not only Chinese Internet companies (e.g., Tencent, Baidu, Sogou, Youdao, etc.) but also communication/hardware IT companies (e.g., China Mobile, Huawei, ZTE, Lenovo) started to build data mining research labs. There is little doubt that for now it is the best time for data mining in China.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"46 1","pages":"75-76"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80008526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mapping question items to skills with non-negative matrix factorization 用非负矩阵分解映射问题项到技能
M. Desmarais
{"title":"Mapping question items to skills with non-negative matrix factorization","authors":"M. Desmarais","doi":"10.1145/2207243.2207248","DOIUrl":"https://doi.org/10.1145/2207243.2207248","url":null,"abstract":"Intelligent learning environments need to assess the student skills to tailor course material, provide helpful hints, and in general provide some kind of personalized interaction. To perform this assessment, question items, exercises, and tasks are presented to the student. This assessment relies on a mapping of tasks to skills. However, the process of deciding which skills are involved in a given task is tedious and challenging. Means to automate it are highly desirable, even if only partial automation that provides supportive tools can be achieved. A recent technique based on Non-negative Matrix Factorization (NMF) was shown to offer valuable results, especially due to the fact that the resulting factorization allows a straightforward interpretation in terms of a Q-matrix. We investigate the factors and assumptions under which NMF can effectively derive the underlying high level skills behind assessment results. We demonstrate the use of different techniques to analyze and interpret the output of NMF. We propose a simple model to generate simulated data and to provide lower and upper bounds for quantifying skill effect. Using the simulated data, we show that, under the assumption of independent skills, the NMF technique is highly effective in deriving the Q-matrix. However, the NMF performance degrades under different ratios of variance between subject performance, item difficulty, and skill mastery. The results corroborates conclusions from previous work in that high level skills, corresponding to general topics like World History and Biology, seem to have no substantial effect on test performance, whereas other topics like Mathematics and French do. The analysis and visualization techniques of the NMF output, along with the simulation approach presented in this paper, should be useful for future investigations using NMF for Q-matrix induction from data.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"1 1","pages":"30-36"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81237442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 65
A conversation with Dr. Haifeng Wang 与王海峰博士的对话
Haifeng Wang
{"title":"A conversation with Dr. Haifeng Wang","authors":"Haifeng Wang","doi":"10.1145/2207243.2207264","DOIUrl":"https://doi.org/10.1145/2207243.2207264","url":null,"abstract":"My group mainly works on data mining applications in Internet products including the search engine. We study two types of data, Web data and user logs. First, the Web data includes different entities (e.g. websites and web pages), edges between entities (e.g. hyper links), and content (e.g. text and rich-media). Second, use logs contain various user behavior information produced by users when they are using the search engine or other Internet products. These two types of data have different properties, but are correlated and complementary. We build a complete view of the data, mine the most valuable knowledge from the data, and improve our various products, e.g. the Baidu search engine.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"84 1","pages":"91"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88219404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A conversation with Professor Zhongzhi Shi 与史忠植教授的对话
Zhongzhi Shi
{"title":"A conversation with Professor Zhongzhi Shi","authors":"Zhongzhi Shi","doi":"10.1145/2207243.2207263","DOIUrl":"https://doi.org/10.1145/2207243.2207263","url":null,"abstract":"Knowledge Discovery from Data (KDD) or Data Mining is a broad area that integrates techniques from several fields including machine learning, statistics, pattern recognition, artificial intelligence, and database systems, for the analysis of large volumes of data. There have been a large number of data mining algorithms rooted in these fields to perform different data analysis tasks. In China, we can divide KDD into 3 milestones: one that is related to machine learning algorithms, one for integrated knowledge discovery from datasets, and one for distributed and parallel KDD.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"10 1","pages":"89-90"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73482202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A conversation with Dr. Edward Y. Chang 与张德昌博士的对话
Edward Y. Chang
{"title":"A conversation with Dr. Edward Y. Chang","authors":"Edward Y. Chang","doi":"10.1145/2207243.2207256","DOIUrl":"https://doi.org/10.1145/2207243.2207256","url":null,"abstract":"1. Please share with us your view on the history and important milestones of the Chinese KDD research and application areas. Ample evidence shows that KDD has become a major topic of interest in both research and industry in China since 2006. In academia, professor Zhi-Hua Zhou at Nanjing University in 2006 chaired a National Machine Learning workshop, inviting researchers in the greater China area to share their experience. In 2009, the first Asian Conference on Machine learning was inaugurated in Nanjing. In industry, both Google and MSRA influenced China Internet leading companies such as Tencent, Baidu, Alibaba, and subsequently Renren and Shanda, to start their large-scale KDD operations. Three KDD engineers on my team were recruited to join Baidu knowledge, the primary KDD application of these Internet companies this far is monetization, improving their ad/offer relevance and hence revenue. Genome Institute (BGI) have made impressive progress in areas of computer vision, pattern recognition, and bio-genomics. Applications such as face, gesture, voice, handwriting, and license plate recognition have been widely deployed. In the bio-genomics area, a team at BGI reached a significant milestone in 2008 by sequencing the first Asian individual's diploid genome and published the result in Nature [1]. This sequencing effort took BGI one year to complete. Subsequently, speeding up genome sequencing has been among BGI's top R&D priorities. (One cannot imagine what one billion genomic sequences and their associated disease profiles can bring to advancing human health.) Researchers led by Ruiqiang Li from BGI and researchers from Google and universities at Canada and Hong Kong have met a couple of times to discuss large-scale data mining issues and solutions in hardware, algorithms, and data transportation. There is no doubt that KDD is thriving in China in several areas and its applications are rapidly growing, thanks to the increase of both data volume and demand for intelligent information analysis and trend prediction. 2. Please describe your expertise and contribution to KDD. In 2005, my team started working on developing parallel machine learning algorithms to mine large-scale datasets. My team were made publicly available through Apache foundation, and they have been downloaded more than 4,000 times. Several Google products also use these parallel algorithms. Prior to the large-scale machine learning work, my work with Simon Tong on using active learning to refine user query concepts published in 2001 [8] has been cited 850 times. Together with my works on …","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"30 1","pages":"73-74"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81223108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social network analysis and mining to support the assessment of on-line student participation 社会网络分析与挖掘,支持在线学生参与评估
Reihaneh Rabbany, M. Takaffoli, Osmar R Zaiane
{"title":"Social network analysis and mining to support the assessment of on-line student participation","authors":"Reihaneh Rabbany, M. Takaffoli, Osmar R Zaiane","doi":"10.1145/2207243.2207247","DOIUrl":"https://doi.org/10.1145/2207243.2207247","url":null,"abstract":"There is a growing number of courses delivered using elearning environments and their online discussions play an important role in collaborative learning of students. Even in courses with a few number of students, there could be thousands of messages generated in a few months within these forums. Manually evaluating the participation of students in such case is a significant challenge, considering the fact that current e-learning environments do not provide much information regarding the structure of interactions between students. There is a recent line of research on applying social network analysis (SNA) techniques to study these interactions.\u0000 Here we propose to exploit SNA techniques, including community mining, in order to discover relevant structures in social networks we generate from student communications but also information networks we produce from the content of the exchanged messages. With visualization of these discovered relevant structures and the automated identification of central and peripheral participants, an instructor is provided with better means to assess participation in the online discussions. We implemented these new ideas in a toolbox, named Meerkat-ED, which automatically discovers relevant network structures, visualizes overall snapshots of interactions between the participants in the discussion forums, and outlines the leader/peripheral students. Moreover, it creates a hierarchical summarization of the discussed topics, which gives the instructor a quick view of what is under discussion. We believe exploiting the mining abilities of this toolbox would facilitate fair evaluation of students' participation in online courses.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"8 1","pages":"20-29"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82403569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 52
A conversation with Professor Jianzhong Li 与李建忠教授的对话
Jianzhong Li
{"title":"A conversation with Professor Jianzhong Li","authors":"Jianzhong Li","doi":"10.1145/2207243.2207258","DOIUrl":"https://doi.org/10.1145/2207243.2207258","url":null,"abstract":"The research on knowledge discovery and data mining (KDD) in China started a few years later than some other countries. In 1993, the National Natural Science Foundation of China (NSFC) funded the first research project in the field of KDD. In the nearly 20 years of development, a large number of research institutes and universities have been active in carrying out innovative research on the theory and applications of KDD, including the Tsinghua University, Peking University, Fudan University, Nanjing University, Institute of Computing Technology in Chinese Academy of Sciences, Sichuan University, Harbin Institute of Technology and so on.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"48 1","pages":"77-78"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73305173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data mining for improving textbooks 改进教科书的数据挖掘
R. Agrawal, Sreenivas Gollapudi, A. Kannan, K. Kenthapadi
{"title":"Data mining for improving textbooks","authors":"R. Agrawal, Sreenivas Gollapudi, A. Kannan, K. Kenthapadi","doi":"10.1145/2207243.2207246","DOIUrl":"https://doi.org/10.1145/2207243.2207246","url":null,"abstract":"We present our early explorations into developing a data mining based approach for enhancing the quality of textbooks. We describe a diagnostic tool to algorithmically identify deficient sections in textbooks. We also discuss techniques for algorithmically augmenting textbook sections with links to selective content mined from the Web. Our evaluation, employing widely-used textbooks from India, indicates that developing technological approaches to help improve textbooks holds promise.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"21 1","pages":"7-19"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72644397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A conversation with Professor Shan Wang et al. 与王山教授等人的对话
Shan Wang, Cuiping Li, Hong Chen
{"title":"A conversation with Professor Shan Wang et al.","authors":"Shan Wang, Cuiping Li, Hong Chen","doi":"10.1145/2207243.2207265","DOIUrl":"https://doi.org/10.1145/2207243.2207265","url":null,"abstract":"One year later, the Chinese government launched the Dragon Star Plan [1]. It is aimed to organize a group of oversea Chinese scholars to come back to China to teach U.S. graduate level courses systematically on a particular area around the universities in China. These scholars usually had got some achievements and had certain positions in the United States’ Academia. With the support of this plan, since 2002, quite a few world-famous DM researchers such as Jiawei Han, Qiang Yang, Jian Pei, Hui Xiong were invited to China to teach DM courses in several universities. All these (the book and courses) significantly promoted the popularization of DM technology in China.","PeriodicalId":90050,"journal":{"name":"SIGKDD explorations : newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining","volume":"160 1","pages":"92-95"},"PeriodicalIF":0.0,"publicationDate":"2012-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74303811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信