Session description
Blue-green and canary deploys are table stakes for application code, but they’re surprisingly hard to get right for GraphQL. Routers often just “pull latest” schema, rollbacks mean republishing and recomposing, and it’s nearly impossible to answer a basic incident question: “What schema was this request actually hitting?”. After testing in a staging environment and deploying to production, we often find edge cases that broke the assumptions we made in the testing phase.
This talk is an engineering case study. I’ll walk through the design journey that led us to a blue-green deployment model for GraphQL built on immutable schema artifacts and explicit rollbacks. We’ll unpack the constraints (federation, many subgraphs, multiple environments), the dead-ends we hit, and the principles that finally worked.
You’ll leave with a mental model and concrete patterns you can apply to your own GraphQL infrastructure, irrespective of tooling: how to structure blue-green router fleets, how to pin to exact schema versions, how to do instant rollbacks safely, and what to log so you can always reconstruct “what was live where” when production gets weird.