Predicting Prefecture-Level Well-Being Indicators in Japan Using Search Volumes in Internet Search Engines: Infodemiology Study.

IF 5.8 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Medical Internet Research Pub Date : 2024-11-11 DOI:10.2196/64555

Myung Si Yang, Kazuya Taira

{"title":"Predicting Prefecture-Level Well-Being Indicators in Japan Using Search Volumes in Internet Search Engines: Infodemiology Study.","authors":"Myung Si Yang, Kazuya Taira","doi":"10.2196/64555","DOIUrl":null,"url":null,"abstract":"Background: In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as gross domestic product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offers the potential for more cost-effective, timely, and less biased assessments.Objective: This study aimed to develop a model using internet search data to predict well-being indicators at the regional level in Japan, providing policy makers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions.Methods: This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. Predictor variables included z score-normalized relative search volume (RSV) data from Google Trends for words relevant to each domain. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to predict RWI using RSVs, with α balancing ridge and lasso effects and λ regulating their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root mean square errors (RMSE) and coefficients of determination (R2) were used to assess the model's predictive accuracy and fit.Results: An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from -1.587 to 3.902, with SDs between 3.025 and 0.053. The best Elastic Net model (α=0.1, λ=0.906, RMSE=1.290, and R2=0.904) was built using 2010-2016 training data and 2-13 variables per domain. Applied to 2019 test data, it yielded an RMSE of 2.328 and R2 of 0.665.Conclusions: This study demonstrates the effectiveness of using internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost-efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence-based policy making aimed at enhancing well-being.","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"26 ","pages":"e64555"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11589491/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/64555","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as gross domestic product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offers the potential for more cost-effective, timely, and less biased assessments.

Objective: This study aimed to develop a model using internet search data to predict well-being indicators at the regional level in Japan, providing policy makers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions.

Methods: This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. Predictor variables included z score-normalized relative search volume (RSV) data from Google Trends for words relevant to each domain. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to predict RWI using RSVs, with α balancing ridge and lasso effects and λ regulating their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root mean square errors (RMSE) and coefficients of determination (R²) were used to assess the model's predictive accuracy and fit.

Results: An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from -1.587 to 3.902, with SDs between 3.025 and 0.053. The best Elastic Net model (α=0.1, λ=0.906, RMSE=1.290, and R²=0.904) was built using 2010-2016 training data and 2-13 variables per domain. Applied to 2019 test data, it yielded an RMSE of 2.328 and R² of 0.665.

Conclusions: This study demonstrates the effectiveness of using internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost-efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence-based policy making aimed at enhancing well-being.

查看原文本刊更多论文

利用互联网搜索引擎的搜索量预测日本县级幸福指数：Infodemiology研究。

背景：近年来，各国政府和国际组织采用福祉指标已成为评估国家治理和社会进步的重要工具。传统上，福祉主要通过国内生产总值等经济指标来衡量，但这些指标无法反映多方面的福祉，包括社会经济不平等、生活满意度和健康状况。目前的福祉指标，包括主观和客观衡量标准，提供了更广泛的评估，但也面临着调查成本高、难以在国家内部进行区域评估等挑战。网络日志数据作为福祉指标的另一种来源，有可能提供更具成本效益、更及时、偏差更小的评估：本研究旨在利用网络搜索数据建立一个模型，以预测日本地区层面的福祉指标，从而为政策制定者提供一个更便捷、更具成本效益的工具，用于评估公众福祉并做出明智决策：本研究使用日本地区福利指数（RWI）作为结果变量，该指数评估了 2010 年、2013 年、2016 年和 2019 年 47 个都道府县的福利状况。RWI 采用了一种综合方法，将收入、工作和生活满意度等 11 个领域的主观和客观指标整合在一起。预测变量包括谷歌趋势（Google Trends）中与各领域相关词的 z 分归一化相对搜索量（RSV）数据。为确保相关性，分析中排除了不相关的词。采用弹性网方法使用 RSV 预测 RWI，α 平衡脊效应和套索效应，λ 调节其强度。通过交叉验证对模型进行了优化，确定了正则化参数的最佳组合和强度，使预测误差最小化。均方根误差（RMSE）和判定系数（R2）用于评估模型的预测准确性和拟合度：对 Google Trends 数据的分析得出了 275 个与 RWI 领域相关的词，在过滤掉无关词后，收集到了 211 个词的 RSV。这些词在 2010 年、2013 年、2016 年和 2019 年的平均搜索频率介于-1.587 到 3.902 之间，SD 介于 3.025 和 0.053 之间。最佳弹性网模型（α=0.1，λ=0.906，RMSE=1.290，R2=0.904）是使用 2010-2016 年的训练数据和每个领域 2-13 个变量建立的。将其应用于 2019 年的测试数据，RMSE 为 2.328，R2 为 0.665：本研究证明了通过弹性网络机器学习方法使用互联网搜索日志数据来高精度预测日本都道府县 RWI 的有效性，为传统调查方法提供了一种快速、经济高效的替代方法。本研究强调了这种方法的潜力，可为旨在提高福祉的循证政策制定提供基础数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Medical Internet Research 医学-卫生保健

CiteScore

14.40

自引率

5.40%

发文量

654

审稿时长

1 months

期刊介绍： The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades. As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor. Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.