FLIRT: A Feature Generation Toolkit for Wearable Data

Picture of Simon Föll
Simon Föll
Picture of Martin Maritsch
Martin Maritsch
Picture of Federica Spinola
Federica Spinola
Picture of Filipe Barata
Filipe Barata
Picture of Tobias Kowatsch
Tobias Kowatsch
Picture of Elgar Fleisch
Elgar Fleisch
Picture of Felix Wortmann
Felix Wortmann
Published at Computer Methods and Programs in Biomedicine 2021


Background and Objective: Researchers use wearable sensing data and machine learning (ML) models to predict various health and behavioral outcomes. However, sensor data from commercial wearables are prone to noise, missing, or artifacts. Even with the recent interest in deploying commercial wearables for long-term studies, there does not exist a standardized way to process the raw sensor data and researchers often use highly specific functions to preprocess, clean, normalize, and compute features. This leads to a lack of uniformity and reproducibility across different studies, making it difficult to compare results. To overcome these issues, we present FLIRT: A Feature Generation Toolkit for Wearable Data; it is an open-source Python package that focuses on processing physiological data specifically from commercial wearables with all its challenges from data cleaning to feature extraction.

Methods: FLIRT leverages a variety of state-of-the-art algorithms (e.g., particle filters, ML-based artifact detection) to ensure a robust preprocessing of physiological data from wearables. In a subsequent step, FLIRT utilizes a sliding-window approach and calculates a feature vector of more than 100 dimensions – a basis for a wide variety of ML algorithms.

Results: We evaluated FLIRT on the publicly available WESAD dataset, which focuses on stress detection with an Empatica E4 wearable. Preprocessing the data with FLIRT ensures that unintended noise and artifacts are appropriately filtered. In the classification task, FLIRT outperforms the preprocessing baseline of the original WESAD paper.

Conclusion: FLIRT provides functionalities beyond existing packages that can address unmet needs in physiological data processing and feature generation: (a) integrated handling of common wearable file formats (e.g., Empatica E4 archives), (b) robust preprocessing, and (c) standardized feature generation that ensures reproducibility of results. Nevertheless, while FLIRT comes with a default configuration to accommodate most situations, it offers a highly configurable interface for all of its implemented algorithms to account for specific needs.