Forecasting-based Biomedical Time-series Data Synthesis for Open Data and Robust AI

2025-11-24 20:00 GMT · 5 months ago aimagpro.com

arXiv:2510.04622v2 Announce Type: replace
Abstract: The limited data availability due to strict privacy regulations and significant resource demands severely constrains biomedical time-series AI development, which creates a critical gap between data requirements and accessibility. Synthetic data generation presents a promising solution by producing artificial datasets that maintain the statistical properties of real biomedical time-series data without compromising patient confidentiality. While GANs, VAEs, and diffusion models capture global data distributions, forecasting models offer inductive biases tailored for sequential dynamics. We propose a framework for synthetic biomedical time-series data generation based on recent forecasting models that accurately replicates complex electrophysiological signals such as EEG and EMG with high fidelity. These synthetic datasets can be freely shared for open AI development and consistently improve downstream model performance. Numerical results on sleep-stage classification show up to a 3.71% performance gain with augmentation and a 91.00% synthetic-only accuracy that surpasses the real-data-only baseline.