Skip to main content
Most deployment issues are easier to prevent than to debug. This page gives you a practical operating checklist.

Safe rollout checklist

  1. Keep changes small (one logical change set per apply).
  2. Avoid changing many resource IDs at once.
  3. Push a version and check diff before apply.
  4. Apply during a window where you can monitor results.
  5. Confirm run completion before stacking more changes.

High-risk changes to treat carefully

  • deleting multiple resources,
  • switching core resource shape that may require replacement,
  • empty desired state (spec.resources: []).
For high-risk changes, validate in a non-production environment first.

Common failure categories

Policy or billing gate failures

These happen before queueing, such as:
  • subscription not active,
  • plan limits reached,
  • spending limit exceeded.

Validation/configuration issues

The YAML is accepted structurally but contains a change that cannot be applied as requested.

Provider/runtime failures

The provider accepted part of the plan but one or more resource operations failed.

Fast recovery workflow

  1. Open run details and identify first failing resource.
  2. Fix the root cause (config, capacity, billing, or provider dependency).
  3. Re-apply (after pushing a new version when YAML changed) or resume the failed/canceled run.
  4. Confirm all expected resources reached final desired state.

When to use resume vs re-apply

Use resume when:
  • same intended change,
  • root cause fixed,
  • the run is already failed or canceled,
  • you want to continue the same run context (POST /deployments/runs/{runId}/resume).
Use re-apply when:
  • you intentionally changed desired state,
  • you pushed a new version and want a fresh run for that version (POST /deployments/apply),
  • there is no resumable run you want to continue.

Rollback pattern

If a rollout fails functionally (not operationally), revert YAML, push a new version, diff it against the prior version, then apply.

Operational habits that improve reliability

  • Maintain a predictable naming/ID strategy.
  • Keep an internal change log for major deployment updates.
  • Pair infrastructure changes with billing awareness for large scale-ups.