TreeGPT: A Novel Hybrid Architecture for Abstract Syntax Tree Processing with Global Parent-Child Aggregation
arXiv:2509.05550v1 Announce Type: new Abstract: We introduce TreeGPT, a novel neural architecture that combines transformer-based attention mechanisms with global parent-child aggregation for processing Abstract Syntax Trees (ASTs) in neural program synthesis tasks. Unlike traditional approaches that rely solely on sequential processing or graph neural networks, TreeGPT employs a hybrid design that leverages both self-attention for capturing local dependencies and a specialized Tree Feed-Forward Network (TreeFFN) for modeling hierarchical tree structures through iterative message passing. The core innovation lies in our Global Parent-Child Aggregation mechanism, formalized as: $$h_i^{(t+1)} = sigma Big( h_i^{(0)} + W_{pc} sum_{(p,c) in E_i} f(h_p^{(t)}, h_c^{(t)}) + b Big)$$ where $h_i^{(t)}$ represents the hidden state of node $i$ at iteration $t$, $E_i$ denotes all parent-child edges involving node $i$, and $f(h_p, h_c)$ is an edge aggregation function. This formulation enables each node to progressively aggregate information from the entire tree structure through $T$ iterations. Our architecture integrates optional enhancements including gated aggregation with learnable edge weights, residual connections for gradient stability, and bidirectional propagation for capturing both bottom-up and top-down dependencies. We evaluate TreeGPT on the ARC Prize 2025 dataset, a challenging visual reasoning benchmark requiring abstract pattern recognition and rule inference. Experimental results demonstrate that TreeGPT achieves 96% accuracy, significantly outperforming transformer baselines (1.3%), large-scale models like Grok-4 (15.9%), and specialized program synthesis methods like SOAR (52%) while using only 1.5M parameters. Our comprehensive ablation study reveals that edge projection is the most critical component, with the combination of edge projection and gating achieving optimal performance.
