A Deep Generative Model for Five-Class Sleep Staging with Arbitrary Sensor Input

Published in IEEE Journal of Biomedical and Health Informatics, 2025

Hans van Gorp, Merel M van Gilst, Pedro Fonseca, Fokke B van Meulen, Johannes P van Dijk, Sebastiaan Overeem, and Ruud J G van Sloun, “A Deep Generative Model for Five-Class Sleep Staging with Arbitrary Sensor Input,” IEEE Journal of Biomedical and Health Informatics, Early Access 2025. [link] [pdf]

Gold-standard sleep scoring is based on epoch-based assignment of sleep stages based on a combination of EEG, EOG and EMG signals. However, a polysomnographic recording consists of many other signals that could be used for sleep staging, including cardio-respiratory modalities. Leveraging this signal variety would offer important advantages, for example increasing reliability, resilience to signal loss, and application to long-term non-obtrusive recordings. We developed a deep generative model for automatic sleep staging from a plurality of sensors and any -arbitrary- combination thereof. We trained a score-based diffusion model using a dataset of 1947 expert-labelled overnight recordings with 36 different signals, and achieved zero-shot inference on any sensor set by leveraging a novel Bayesian factorization of the score function across the sensors. On single-channel EEG, the model reaches the performance limit in terms of polysomnography inter-rater agreement (5- class accuracy 85.6%, Cohen’s kappa 0.791). Moreover, the method offers full flexibility to use any sensor set, for example finger photoplethysmography, nasal flow and thoracic respiratory movements, (5-class accuracy 79.0%, Cohen’s kappa of 0.697), or even derivations very unconventional for sleep staging, such as tibialis and sternocleidomastoid EMG (5-class accuracy 71.0%, kappa 0.575). Additionally, we propose a novel interpretability metric in terms of information gain per sensor and show this is linearly correlated with classification performance. Finally, our model allows for post- hoc addition of entirely new sensor modalities by merely training a score estimator on the novel input instead of having to retrain from scratch on all inputs.