Directed Information $gamma$-covering: An Information-Theoretic Framework for Context Engineering

2025-10-01 19:00 GMT · 6 months ago aimagpro.com

arXiv:2510.00079v1 Announce Type: cross
Abstract: We introduce textbf{Directed Information $gamma$-covering}, a simple but general framework for redundancy-aware context engineering. Directed information (DI), a causal analogue of mutual information, measures asymmetric predictiveness between chunks. If $operatorname{DI}_{i to j} ge H(C_j) – gamma$, then $C_i$ suffices to represent $C_j$ up to $gamma$ bits. Building on this criterion, we formulate context selection as a $gamma$-cover problem and propose a greedy algorithm with provable guarantees: it preserves query information within bounded slack, inherits $(1+ln n)$ and $(1-1/e)$ approximations from submodular set cover, and enforces a diversity margin. Importantly, building the $gamma$-cover is emph{query-agnostic}: it incurs no online cost and can be computed once offline and amortized across all queries. Experiments on HotpotQA show that $gamma$-covering consistently improves over BM25, a competitive baseline, and provides clear advantages in hard-decision regimes such as context compression and single-slot prompt selection. These results establish DI $gamma$-covering as a principled, self-organizing backbone for modern LLM pipelines.