Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancy

IF 1.5 3区管理学 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

Journal of Data and Information Science Pub Date : 2023-12-04 DOI:10.2478/jdis-2023-0025

Wei Cui, Zhiqiang Xu, Ren Mu

{"title":"Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancy","authors":"Wei Cui, Zhiqiang Xu, Ren Mu","doi":"10.2478/jdis-2023-0025","DOIUrl":null,"url":null,"abstract":"Purpose Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. Design/methodology/approach This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. Findings This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. Research limitations The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. Practical implications The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. Originality/value To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers.","PeriodicalId":44622,"journal":{"name":"Journal of Data and Information Science","volume":"59 2","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2023-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Data and Information Science","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.2478/jdis-2023-0025","RegionNum":3,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Purpose Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge. Design/methodology/approach This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation. Findings This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness. Research limitations The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function. Practical implications The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions. Originality/value To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers.

查看原文本刊更多论文

基于整数规划的降维模型对影响预期寿命关键指标的分析

探索一种能够熟练地剔除异常值并选择适当数量聚类的降维模型具有深远的理论和实践意义。此外，这些模型的可解释性提出了一个持续的挑战。本文提出了两种基于整数规划(DRMBIP)的创新降维模型。这些模型通过每个指标与其类中心的相关性来评估紧密性，而通过不同类中心之间的相关性来评估分离性。与DRMBIP-p相反，DRMBIP-v将阈值参数视为一个变量，旨在最佳地平衡紧凑性和分离性。这项研究从全球卫生观察站(GHO)获得数据，调查了141项影响预期寿命的指标。研究结果表明，DRMBIP-p有效地降低了数据的维数，确保了数据的紧凑性。它还保持与其他模型的兼容性。此外，DRMBIP-v找到了最佳结果，表现出优异的分离性。可视化结果表明，所有类都具有较高的紧凑性。DRMBIP-p需要输入相关阈值参数，这对最终降维结果的有效性起着至关重要的作用。在DRMBIP-v中，将阈值参数修改为variable可能会强调分离性或紧凑性。这就需要对目标函数内的溢出分量进行人工调整。本文提出的DRMBIP善于揭示高维指标中的初级几何结构。通过寿命预期数据的验证，本文展示了帮助数据挖掘者减少数据维度的潜力。原创性/价值据我们所知，这是第一次将整数规划用于建立带有指标过滤的降维模型。它不仅在预期寿命方面有应用，在需要精确类中心的数据挖掘工作中也有明显的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Data and Information Science INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

3.50

自引率

6.70%

发文量

495

期刊介绍： JDIS devotes itself to the study and application of the theories, methods, techniques, services, infrastructural facilities using big data to support knowledge discovery for decision & policy making. The basic emphasis is big data-based, analytics centered, knowledge discovery driven, and decision making supporting. The special effort is on the knowledge discovery to detect and predict structures, trends, behaviors, relations, evolutions and disruptions in research, innovation, business, politics, security, media and communications, and social development, where the big data may include metadata or full content data, text or non-textural data, structured or non-structural data, domain specific or cross-domain data, and dynamic or interactive data. The main areas of interest are: (1) New theories, methods, and techniques of big data based data mining, knowledge discovery, and informatics, including but not limited to scientometrics, communication analysis, social network analysis, tech & industry analysis, competitive intelligence, knowledge mapping, evidence based policy analysis, and predictive analysis. (2) New methods, architectures, and facilities to develop or improve knowledge infrastructure capable to support knowledge organization and sophisticated analytics, including but not limited to ontology construction, knowledge organization, semantic linked data, knowledge integration and fusion, semantic retrieval, domain specific knowledge infrastructure, and semantic sciences. (3) New mechanisms, methods, and tools to embed knowledge analytics and knowledge discovery into actual operation, service, or managerial processes, including but not limited to knowledge assisted scientific discovery, data mining driven intelligent workflows in learning, communications, and management. Specific topic areas may include: Knowledge organization Knowledge discovery and data mining Knowledge integration and fusion Semantic Web metrics Scientometrics Analytic and diagnostic informetrics Competitive intelligence Predictive analysis Social network analysis and metrics Semantic and interactively analytic retrieval Evidence-based policy analysis Intelligent knowledge production Knowledge-driven workflow management and decision-making Knowledge-driven collaboration and its management Domain knowledge infrastructure with knowledge fusion and analytics Development of data and information services