{"title":"Data Mining and Applied Linear Algebra","authors":"M. Chu","doi":"10.1109/ICKS.2008.39","DOIUrl":null,"url":null,"abstract":"In this era of hyper-technological innovation, massive amounts of data are being generated at almost every level of applications in almost every area of disciplines. Extracting interesting knowledge from raw data, or data mining in a broader sense, has become an indispensable task. Nevertheless, data collected from complex phenomena represent often the integrated result of several interrelated variables, whereas these variables are less precisely defined. The basic principle of data mining is to distinguish which variable is related to which and how the variables are related. In many situations, the digitized information is gathered and stored as a data matrix. It is often the case, or so assumed, that the exogenous variables depend on the endogenous variables in a linear relationship. Retrieving \"useful\" information therefore can often be characterized as finding \"suitable\" matrix factorization. This paper offers a synopsis from this prospect on how linear algebra techniques can help to carry out the task of data mining. Examples from factor analysis, cluster analysis, latent semantic indexing and link analysis are used to demonstrate how matrix factorization helps to uncover hidden connection and do things fast. Low rank matrix approximation plays a fundamental role in cleaning the data and compressing the data. Other types of constraints, such as nonnegativity, will also be briefly discussed.","PeriodicalId":443068,"journal":{"name":"International Conference on Informatics Education and Research for Knowledge-Circulating Society (icks 2008)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Informatics Education and Research for Knowledge-Circulating Society (icks 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKS.2008.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this era of hyper-technological innovation, massive amounts of data are being generated at almost every level of applications in almost every area of disciplines. Extracting interesting knowledge from raw data, or data mining in a broader sense, has become an indispensable task. Nevertheless, data collected from complex phenomena represent often the integrated result of several interrelated variables, whereas these variables are less precisely defined. The basic principle of data mining is to distinguish which variable is related to which and how the variables are related. In many situations, the digitized information is gathered and stored as a data matrix. It is often the case, or so assumed, that the exogenous variables depend on the endogenous variables in a linear relationship. Retrieving "useful" information therefore can often be characterized as finding "suitable" matrix factorization. This paper offers a synopsis from this prospect on how linear algebra techniques can help to carry out the task of data mining. Examples from factor analysis, cluster analysis, latent semantic indexing and link analysis are used to demonstrate how matrix factorization helps to uncover hidden connection and do things fast. Low rank matrix approximation plays a fundamental role in cleaning the data and compressing the data. Other types of constraints, such as nonnegativity, will also be briefly discussed.