arXiv:2510.20028v1 Announce Type: new
Abstract: Since its 2009 genesis block, the Bitcoin network has processed num{>1.08} billion (B) transactions representing num{>8.72}B BTC, offering rich potential for machine learning (ML); yet, its pseudonymity and obscured flow of funds inherent in its utxo-based design, have rendered this data largely inaccessible for ML research. Addressing this gap, we present an ML-compatible graph modeling the Bitcoin’s economic topology by reconstructing the flow of funds. This temporal, heterogeneous graph encompasses complete transaction history up to block cutoffHeight, consisting of num{>2.4}B nodes and num{>39.72}B edges. Additionally, we provide custom sampling methods yielding node and edge feature vectors of sampled communities, tools to load and analyze the Bitcoin graph data within specialized graph databases, and ready-to-use database snapshots. This comprehensive dataset and toolkit empower the ML community to tackle Bitcoin’s intricate ecosystem at scale, driving progress in applications such as anomaly detection, address classification, market analysis, and large-scale graph ML benchmarking. Dataset and code available at href{https://github.com/B1AAB/EBA}{github.com/b1aab/eba}
