O. Y. Rodionova, N. I. Kurysheva, G. A. Sharova, A. L. Pomerantsev
{"title":"Novelty and Similarity: Detection Using Data-Driven Soft Independent Modeling of Class Analogy","authors":"O. Y. Rodionova, N. I. Kurysheva, G. A. Sharova, A. L. Pomerantsev","doi":"10.1002/cem.3587","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Novelty and similarity are complex concepts that have numerous applications in various fields, including biology and medicine. Novelty detection is a technique used to determine whether a dataset is different from another dataset considered as a standard. Similarity detection is a technique used to determine whether two datasets belong to the same population. Novelty and similarity are closely related concepts; however, they are not complementary. Novelty is a much more popular one, and there are many publications about it. Similarity is, in fact, a new concept that has not yet been explored in depth. Classical statistics offers a large number of tools suitable for detection of similarity, mostly in the univariate case. At the same time, this topic has been insufficiently studied in the field of machine learning. This paper suggests several principles which are important for this research and also present a method for the detection of both novelty and similarity. The method uses a one-class classifier, known as Data-Driven Soft Independent Modeling of Class Analogy (DD-SIMCA). Three examples illustrate our approach. The first one uses simulated data and demonstrates the performance of DD-SIMCA for the detection of novelty. The second example uses a real-world data and studies similarity of two groups of patients who participate in the evaluation of the effectiveness of the treatment of primary angle-closure glaucoma. The third example comes from medical diagnostics. This is a real-world publicly available data used for comparison of various classification algorithms.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"38 10","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.3587","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
Abstract
Novelty and similarity are complex concepts that have numerous applications in various fields, including biology and medicine. Novelty detection is a technique used to determine whether a dataset is different from another dataset considered as a standard. Similarity detection is a technique used to determine whether two datasets belong to the same population. Novelty and similarity are closely related concepts; however, they are not complementary. Novelty is a much more popular one, and there are many publications about it. Similarity is, in fact, a new concept that has not yet been explored in depth. Classical statistics offers a large number of tools suitable for detection of similarity, mostly in the univariate case. At the same time, this topic has been insufficiently studied in the field of machine learning. This paper suggests several principles which are important for this research and also present a method for the detection of both novelty and similarity. The method uses a one-class classifier, known as Data-Driven Soft Independent Modeling of Class Analogy (DD-SIMCA). Three examples illustrate our approach. The first one uses simulated data and demonstrates the performance of DD-SIMCA for the detection of novelty. The second example uses a real-world data and studies similarity of two groups of patients who participate in the evaluation of the effectiveness of the treatment of primary angle-closure glaucoma. The third example comes from medical diagnostics. This is a real-world publicly available data used for comparison of various classification algorithms.
期刊介绍:
The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.