Article: Two Misconfigurations That Caused Spark OOM Failures on Kubernetes

2026-06-03 00:00 GMT · 2 days ago aimagpro.com

After migrating Spark pipelines to Azure Kubernetes Service, two infrastructure settings interacted destructively: spark.kubernetes.local.dirs.tmpfs=true backed shuffle spill with RAM instead of disk, and a hard podAffinity rule forced all executors onto one node. Together, they caused repeated OOM kills invisible to standard diagnostics. By Pranav Bhasker