A Multi-Modal CNN-LSTM Framework with Multi-Head Attention and Focal Loss for Real-Time Elderly Fall Detection

2026-03-24 19:00 GMT · 17 hours ago aimagpro.com

arXiv:2603.22313v1 Announce Type: new
Abstract: The increasing global aging population has intensified the demand for reliable health monitoring systems, particularly those capable of detecting critical events such as falls among elderly individuals. Traditional fall detection approaches relying on single-modality acceleration data suffer from high false alarm rates, while conventional machine learning methods require extensive hand-crafted feature engineering. This paper proposes a novel multi-modal deep learning framework, MultiModalFallDetector, designed for real-time elderly fall detection using wearable sensors. Our approach integrates multiple innovations: a multi-scale CNN-based feature extractor capturing motion dynamics at varying temporal resolutions; fusion of tri-axial accelerometer, gyroscope, and four-channel physiological signals; incorporation of a multi-head self-attention mechanism for dynamic temporal weighting; adoption of Focal Loss to mitigate severe class imbalance; introduction of an auxiliary activity classification task for regularization; and implementation of transfer learning from UCI HAR to SisFall dataset. Extensive experiments on the SisFall dataset, which includes real-world simulated fall trials from elderly participants (aged 60-85), demonstrate that our framework achieves an F1-score of 98. 7, Recall of 98. 9, and AUC-ROC of 99. 4, significantly outperforming baseline methods including traditional machine learning and standard deep learning approaches. The model maintains sub- 50ms inference latency on edge devices, confirming its suitability for real-time deployment in geriatric care settings.

No results