arXiv:2509.18168v1 Announce Type: new
Abstract: Semantic parsing of long documents remains challenging due to quadratic growth in pairwise composition and memory requirements. We introduce textbf{Hierarchical Segment-Graph Memory (HSGM)}, a novel framework that decomposes an input of length $N$ into $M$ meaningful segments, constructs emph{Local Semantic Graphs} on each segment, and extracts compact emph{summary nodes} to form a emph{Global Graph Memory}. HSGM supports emph{incremental updates} — only newly arrived segments incur local graph construction and summary-node integration — while emph{Hierarchical Query Processing} locates relevant segments via top-$K$ retrieval over summary nodes and then performs fine-grained reasoning within their local graphs.
Theoretically, HSGM reduces worst-case complexity from $O(N^2)$ to $O!left(N,k + (N/k)^2right)$, with segment size $k ll N$, and we derive Frobenius-norm bounds on the approximation error introduced by node summarization and sparsification thresholds. Empirically, on three benchmarks — long-document AMR parsing, segment-level semantic role labeling (OntoNotes), and legal event extraction — HSGM achieves emph{2–4$times$ inference speedup}, emph{$>60%$ reduction} in peak memory, and emph{$ge 95%$} of baseline accuracy. Our approach unlocks scalable, accurate semantic modeling for ultra-long texts, enabling real-time and resource-constrained NLP applications.
