{"title":"Predicting Prefecture-Level Well-Being Indicators in Japan Using Search Volumes in Internet Search Engines: Infodemiology Study.","authors":"Myung Si Yang, Kazuya Taira","doi":"10.2196/64555","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as gross domestic product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offers the potential for more cost-effective, timely, and less biased assessments.</p><p><strong>Objective: </strong>This study aimed to develop a model using internet search data to predict well-being indicators at the regional level in Japan, providing policy makers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions.</p><p><strong>Methods: </strong>This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. Predictor variables included z score-normalized relative search volume (RSV) data from Google Trends for words relevant to each domain. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to predict RWI using RSVs, with α balancing ridge and lasso effects and λ regulating their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root mean square errors (RMSE) and coefficients of determination (R<sup>2</sup>) were used to assess the model's predictive accuracy and fit.</p><p><strong>Results: </strong>An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from -1.587 to 3.902, with SDs between 3.025 and 0.053. The best Elastic Net model (α=0.1, λ=0.906, RMSE=1.290, and R<sup>2</sup>=0.904) was built using 2010-2016 training data and 2-13 variables per domain. Applied to 2019 test data, it yielded an RMSE of 2.328 and R<sup>2</sup> of 0.665.</p><p><strong>Conclusions: </strong>This study demonstrates the effectiveness of using internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost-efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence-based policy making aimed at enhancing well-being.</p>","PeriodicalId":16337,"journal":{"name":"Journal of Medical Internet Research","volume":"26 ","pages":"e64555"},"PeriodicalIF":5.8000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11589491/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Medical Internet Research","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/64555","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: In recent years, the adoption of well-being indicators by national governments and international organizations has emerged as an important tool for evaluating state governance and societal progress. Traditionally, well-being has been gauged primarily through economic metrics such as gross domestic product, which fall short of capturing multifaceted well-being, including socioeconomic inequalities, life satisfaction, and health status. Current well-being indicators, including both subjective and objective measures, offer a broader evaluation but face challenges such as high survey costs and difficulties in evaluating at regional levels within countries. The emergence of web log data as an alternative source of well-being indicators offers the potential for more cost-effective, timely, and less biased assessments.
Objective: This study aimed to develop a model using internet search data to predict well-being indicators at the regional level in Japan, providing policy makers with a more accessible and cost-effective tool for assessing public well-being and making informed decisions.
Methods: This study used the Regional Well-Being Index (RWI) for Japan, which evaluates prefectural well-being across 47 prefectures for the years 2010, 2013, 2016, and 2019, as the outcome variable. The RWI includes a comprehensive approach integrating both subjective and objective indicators across 11 domains, including income, job, and life satisfaction. Predictor variables included z score-normalized relative search volume (RSV) data from Google Trends for words relevant to each domain. Unrelated words were excluded from the analysis to ensure relevance. The Elastic Net methodology was applied to predict RWI using RSVs, with α balancing ridge and lasso effects and λ regulating their strengths. The model was optimized by cross-validation, determining the best mix and strength of regularization parameters to minimize prediction error. Root mean square errors (RMSE) and coefficients of determination (R2) were used to assess the model's predictive accuracy and fit.
Results: An analysis of Google Trends data yielded 275 words related to the RWI domains, and RSVs were collected for 211 words after filtering out irrelevant terms. The mean search frequencies for these words during 2010, 2013, 2016, and 2019 ranged from -1.587 to 3.902, with SDs between 3.025 and 0.053. The best Elastic Net model (α=0.1, λ=0.906, RMSE=1.290, and R2=0.904) was built using 2010-2016 training data and 2-13 variables per domain. Applied to 2019 test data, it yielded an RMSE of 2.328 and R2 of 0.665.
Conclusions: This study demonstrates the effectiveness of using internet search log data through the Elastic Net machine learning method to predict the RWI in Japanese prefectures with high accuracy, offering a rapid and cost-efficient alternative to traditional survey approaches. This study highlights the potential of this methodology to provide foundational data for evidence-based policy making aimed at enhancing well-being.
期刊介绍:
The Journal of Medical Internet Research (JMIR) is a highly respected publication in the field of health informatics and health services. With a founding date in 1999, JMIR has been a pioneer in the field for over two decades.
As a leader in the industry, the journal focuses on digital health, data science, health informatics, and emerging technologies for health, medicine, and biomedical research. It is recognized as a top publication in these disciplines, ranking in the first quartile (Q1) by Impact Factor.
Notably, JMIR holds the prestigious position of being ranked #1 on Google Scholar within the "Medical Informatics" discipline.