Imputation Strategies for Longitudinal Behavioral Studies: Predicting Depression Using GLOBEM Datasets
Published at
Companion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing
2024
Abstract
Given the prevalence of missing data in longitudinal passive sensing studies, data imputation -- a critical preprocessing step -- is often overlooked by researchers in favor of other aspects of data analyses, like building sophisticated models or outcome prediction. In this paper, we seek to direct the attention of the behavioral and mental health-sensing community toward the importance of data imputation in such studies. In this work, we evaluate and benchmark off-the-shelf imputation strategies using the open-source GLOBEM platform and datasets. Our results demonstrate that using appropriate imputation strategies could improve performance by up to 25\% increase in AUROC for predicting participants? future depression labels (self-reported PHQ-4) using past sensing data with the same model building and prediction pipeline as the GLOBEM platform, without compromising the inherent underlying structure of behavioral sensing data post-imputation. Furthermore, we observe that certain imputation strategies significantly improve the separability of predicted depression probabilities on the test data, compared to no or trivial imputation. Lastly, we present a case study of users with changing depression labels and demonstrate that by using these imputation strategies, we are better able to capture and trace within-person transitions of depression as compared to trivial or no imputation.