{"title":"Forecasting mental states in schizophrenia using digital phenotyping data.","authors":"Thierry Jean, Rose Guay Hottin, Pierre Orban","doi":"10.1371/journal.pdig.0000734","DOIUrl":null,"url":null,"abstract":"<p><p>The promise of machine learning successfully exploiting digital phenotyping data to forecast mental states in psychiatric populations could greatly improve clinical practice. Previous research focused on binary classification and continuous regression, disregarding the often ordinal nature of prediction targets derived from clinical rating scales. In addition, mental health ratings typically show important class imbalance or skewness that need to be accounted for when evaluating predictive performance. Besides it remains unclear which machine learning algorithm is best suited for forecast tasks, the eXtreme Gradient Boosting (XGBoost) and long short-term memory (LSTM) algorithms being 2 popular choices in digital phenotyping studies. The CrossCheck dataset includes 6,364 mental state surveys using 4-point ordinal rating scales and 23,551 days of smartphone sensor data contributed by patients with schizophrenia. We trained 120 machine learning models to forecast 10 mental states (e.g., Calm, Depressed, Seeing things) from passive sensor data on 2 predictive tasks (ordinal regression, binary classification) with 2 learning algorithms (XGBoost, LSTM) over 3 forecast horizons (same day, next day, next week). A majority of ordinal regression and binary classification models performed significantly above baseline, with macro-averaged mean absolute error values between 1.19 and 0.77, and balanced accuracy between 58% and 73%, which corresponds to similar levels of performance when these metrics are scaled. Results also showed that metrics that do not account for imbalance (mean absolute error, accuracy) systematically overestimated performance, XGBoost models performed on par with or better than LSTM models, and a significant yet very small decrease in performance was observed as the forecast horizon expanded. In conclusion, when using performance metrics that properly account for class imbalance, ordinal forecast models demonstrated comparable performance to the prevalent binary classification approach without losing valuable clinical information from self-reports, thus providing richer and easier to interpret predictions.</p>","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 2","pages":"e0000734"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The promise of machine learning successfully exploiting digital phenotyping data to forecast mental states in psychiatric populations could greatly improve clinical practice. Previous research focused on binary classification and continuous regression, disregarding the often ordinal nature of prediction targets derived from clinical rating scales. In addition, mental health ratings typically show important class imbalance or skewness that need to be accounted for when evaluating predictive performance. Besides it remains unclear which machine learning algorithm is best suited for forecast tasks, the eXtreme Gradient Boosting (XGBoost) and long short-term memory (LSTM) algorithms being 2 popular choices in digital phenotyping studies. The CrossCheck dataset includes 6,364 mental state surveys using 4-point ordinal rating scales and 23,551 days of smartphone sensor data contributed by patients with schizophrenia. We trained 120 machine learning models to forecast 10 mental states (e.g., Calm, Depressed, Seeing things) from passive sensor data on 2 predictive tasks (ordinal regression, binary classification) with 2 learning algorithms (XGBoost, LSTM) over 3 forecast horizons (same day, next day, next week). A majority of ordinal regression and binary classification models performed significantly above baseline, with macro-averaged mean absolute error values between 1.19 and 0.77, and balanced accuracy between 58% and 73%, which corresponds to similar levels of performance when these metrics are scaled. Results also showed that metrics that do not account for imbalance (mean absolute error, accuracy) systematically overestimated performance, XGBoost models performed on par with or better than LSTM models, and a significant yet very small decrease in performance was observed as the forecast horizon expanded. In conclusion, when using performance metrics that properly account for class imbalance, ordinal forecast models demonstrated comparable performance to the prevalent binary classification approach without losing valuable clinical information from self-reports, thus providing richer and easier to interpret predictions.