{"title":"Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancy","authors":"Wei Cui, Zhiqiang Xu, Ren Mu","doi":"10.2478/jdis-2023-0025","DOIUrl":null,"url":null,"abstract":"Purpose Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. Design/methodology/approach This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. Findings This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. Research limitations The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. Practical implications The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. Originality/value To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"59 2","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Science","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.2478/jdis-2023-0025","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. Design/methodology/approach This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. Findings This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. Research limitations The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. Practical implications The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. Originality/value To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers.
期刊介绍:
JDIS devotes itself to the study and application of the theories, methods, techniques, services, infrastructural facilities using big data to support knowledge discovery for decision & policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. The special effort is on the knowledge discovery to detect and predict structures, trends, behaviors, relations, evolutions and disruptions in research, innovation, business, politics, security, media and communications, and social development, where the big data may include metadata or full content data, text or non-textural data, structured or non-structural data, domain specific or cross-domain data, and dynamic or interactive data.
The main areas of interest are:
(1) New theories, methods, and techniques of big data based data mining, knowledge discovery, and informatics, including but not limited to scientometrics, communication analysis, social network analysis, tech & industry analysis, competitive intelligence, knowledge mapping, evidence based policy analysis, and predictive analysis.
(2) New methods, architectures, and facilities to develop or improve knowledge infrastructure capable to support knowledge organization and sophisticated analytics, including but not limited to ontology construction, knowledge organization, semantic linked data, knowledge integration and fusion, semantic retrieval, domain specific knowledge infrastructure, and semantic sciences.
(3) New mechanisms, methods, and tools to embed knowledge analytics and knowledge discovery into actual operation, service, or managerial processes, including but not limited to knowledge assisted scientific discovery, data mining driven intelligent workflows in learning, communications, and management.
Specific topic areas may include:
Knowledge organization
Knowledge discovery and data mining
Knowledge integration and fusion
Semantic Web metrics
Scientometrics
Analytic and diagnostic informetrics
Competitive intelligence
Predictive analysis
Social network analysis and metrics
Semantic and interactively analytic retrieval
Evidence-based policy analysis
Intelligent knowledge production
Knowledge-driven workflow management and decision-making
Knowledge-driven collaboration and its management
Domain knowledge infrastructure with knowledge fusion and analytics
Development of data and information services