Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation

2026-04-13 19:00 GMT · 5 days ago aimagpro.com

arXiv:2507.22767v3 Announce Type: replace
Abstract: Obtaining human-readable symbolic formulas via genetic programming-based symbolic distillation of a deep neural network trained on a target dataset presents a promising yet underexplored pathway toward explainable artificial intelligence (XAI). However, the standard pipeline frequently yields symbolic models with poor predictive accuracy. We identify a fundamental misalignment in functional complexity as the primary barrier to achieving better performance: standard artificial neural networks (ANNs) often learn accurate but highly irregular functions, whereas symbolic regression typically prioritizes parsimony, resulting in a simpler class of models that fail to adequately distill knowledge from the ANN teacher. To address this gap, we propose a framework that explicitly regularizes the teacher model’s functional smoothness using Jacobian and Lipschitz penalties, with the goal of improving student model distillation. We systematically characterize the trade-off between predictive accuracy and functional complexity through a comprehensive study across 20 datasets and 50 independent trials. Our results demonstrate that students distilled from smoothness-regularized teachers achieve statistically significant improvements in R^2 scores compared to the standard pipeline. We further conduct ablation studies on the student model algorithms. Our findings suggest that smoothness alignment between teacher and student models is a critical factor for effective symbolic distillation.