Safe rollouts and troubleshooting

Most deployment issues are easier to prevent than to debug. This page gives you a practical operating checklist.

Safe rollout checklist

For high-risk changes, validate in a non-production environment first.

These happen before queueing, such as:

The YAML is accepted structurally but contains a change that cannot be applied as requested.

The provider accepted part of the plan but one or more resource operations failed.

Open run details and identify first failing resource.
Fix the root cause (config, capacity, billing, or provider dependency).
Re-apply (after pushing a new version when YAML changed) or resume the failed/canceled run.
Confirm all expected resources reached final desired state.

Use resume when:

same intended change,
root cause fixed,
the run is already failed or canceled,
you want to continue the same run context (POST /deployments/runs/{runId}/resume).

Use re-apply when:

you intentionally changed desired state,
you pushed a new version and want a fresh run for that version (POST /deployments/apply),
there is no resumable run you want to continue.

If a rollout fails functionally (not operationally), revert YAML, push a new version, diff it against the prior version, then apply.