<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>Argocd on Gruion</title><link>https://www.gruion.com/blog/tags/argocd/</link><description>Recent content in Argocd on Gruion</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 18 May 2026 00:20:49 +0000</lastBuildDate><atom:link href="https://www.gruion.com/blog/tags/argocd/index.xml" rel="self" type="application/rss+xml"/><item><title>Fractional DevOps: How to Build Resilient, Secure Pipelines Without a Full-Time Team</title><link>https://www.gruion.com/blog/post/2026-05-18-devops-fractional-devops/</link><pubDate>Mon, 18 May 2026 00:20:49 +0000</pubDate><dc:creator>Gruion</dc:creator><guid>https://www.gruion.com/blog/post/2026-05-18-devops-fractional-devops/</guid><description>Fractional DevOps lets teams ship faster and safer by embedding CI/CD, observability, and supply-chain security without the overhead of a full-time hire.</description><content:encoded><![CDATA[<h2 id="key-takeaways">Key Takeaways</h2>
<ul>
<li>CI/CD pipelines are active attack surfaces — the Shai-Hulud campaign abused OIDC tokens and trusted publishing paths, not code vulnerabilities.</li>
<li>Observability-integrated testing (OpenTelemetry + Flagger canary metrics) cuts production incidents by 50% compared to binary pass/fail gates.</li>
<li>Recording real API behavior for regression tests beats assumption-based scripts — capture what production does, not what you expect it to do.</li>
<li>AI coding agents (Claude Code, Grok Build) accelerate throughput but introduce hidden costs: technical debt, validation time, and cognitive load that standard metrics don&rsquo;t track.</li>
<li>A fractional DevOps partner gives you ArgoCD, Prometheus, and Grafana configured correctly from day one — without a 6-month hiring cycle.</li>
</ul>
<h2 id="tools--setup">Tools &amp; Setup</h2>
<p><strong>Pipeline security first.</strong> After the Mini Shai-Hulud incidents, any team using GitHub Actions or GitLab CI should audit OIDC token scopes immediately. Scope tokens to specific repos and workflows, rotate them on a short TTL, and add Sigstore/cosign attestation verification as a pipeline gate. A one-liner check in your workflow: <code>cosign verify --certificate-identity-regexp=&quot;.*&quot; --certificate-oidc-issuer=&quot;https://token.actions.githubusercontent.com&quot; $IMAGE</code>.</p>
<p><strong>Observability-driven delivery.</strong> Wire ArgoCD + Flagger for progressive delivery with automatic canary analysis. Instrument with OpenTelemetry and export to Grafana + Prometheus. Set RED metric baselines (Requests, Errors, Duration) per canary stage — Flagger will roll back automatically when thresholds breach. Pair this with API traffic recording (tools like Hoverfly or VCR-style capture middleware) to build regression suites from real production behavior, not developer assumptions.</p>
<h2 id="analysis">Analysis</h2>
<p>Modern DevOps resilience is no longer just about shipping fast — it&rsquo;s about shipping safely across an increasingly hostile attack surface. The Shai-Hulud supply-chain campaign is a concrete reminder that CI/CD trust relationships are now primary targets. Organizations relying on OIDC provenance attestations learned the hard way that valid signatures don&rsquo;t equal safe content. The fix isn&rsquo;t bureaucracy — it&rsquo;s automating distrust: verify every artifact, scope every token, and treat your pipeline as a zero-trust boundary.</p>
<p>At the same time, the productivity metrics crisis surfaced by the Harness survey exposes a blind spot that fractional DevOps teams are uniquely positioned to solve. When 94% of engineering leaders admit they aren&rsquo;t tracking AI-related technical debt, validation overhead, or developer burnout, the problem isn&rsquo;t tooling — it&rsquo;s governance and instrumentation. A fractional DevOps engagement typically starts by establishing these baselines: deployment frequency, change failure rate, MTTR, and now, AI task overhead as a first-class metric.</p>
<p>The convergence of AI coding agents (Grok Build&rsquo;s parallel agent arena, Claude Code&rsquo;s deep IDE integration), Kubernetes operational maturity (v1.36&rsquo;s Mixed Version Proxy graduating to beta, watch-based route reconciliation), and supply-chain standards like the EU CRA means the platform engineering surface area has never been wider. Fractional DevOps works precisely because no single company needs a full-time specialist in all of these simultaneously — but they do need someone who has configured all of them before.</p>
<h2 id="sources">Sources</h2>
<ul>
<li><a href="https://devops.com/why-devops-is-critical-for-modern-business-resilience/">https://devops.com/why-devops-is-critical-for-modern-business-resilience/</a></li>
<li><a href="https://devops.com/widespread-mini-shai-hulud-campaign-is-a-matter-of-trust/">https://devops.com/widespread-mini-shai-hulud-campaign-is-a-matter-of-trust/</a></li>
<li><a href="https://devops.com/survey-surfaces-multiple-challenges-measuring-ai-coding-productivity/">https://devops.com/survey-surfaces-multiple-challenges-measuring-ai-coding-productivity/</a></li>
<li><a href="https://devops.com/observability-driven-continuous-testing-in-cloud-native-devops/">https://devops.com/observability-driven-continuous-testing-in-cloud-native-devops/</a></li>
<li><a href="https://devops.com/capturing-real-api-behavior-for-regression-testing-architecture-and-implementation/">https://devops.com/capturing-real-api-behavior-for-regression-testing-architecture-and-implementation/</a></li>
<li><a href="https://devops.com/xai-enters-the-coding-agent-race-with-grok-build/">https://devops.com/xai-enters-the-coding-agent-race-with-grok-build/</a></li>
<li><a href="https://platformengineering.org/blog/understanding-platform-engineering-s-role-in-staying-compliant-with-the-eus-cra">https://platformengineering.org/blog/understanding-platform-engineering-s-role-in-staying-compliant-with-the-eus-cra</a></li>
<li><a href="https://kubernetes.io/blog/2026/05/15/kubernetes-1-36-feature-mixed-version-proxy-beta/">https://kubernetes.io/blog/2026/05/15/kubernetes-1-36-feature-mixed-version-proxy-beta/</a></li>
<li><a href="https://kubernetes.io/blog/2026/05/15/ccm-new-metric-route-sync-total/">https://kubernetes.io/blog/2026/05/15/ccm-new-metric-route-sync-total/</a></li>
</ul>
<hr>
<p><strong>Need help setting this up?</strong> Gruion provides hands-on DevOps services, CI/CD automation, and platform engineering. <a href="https://www.gruion.com/#contact">Get a free consultation</a></p>
]]></content:encoded><enclosure url="https://www.gruion.com/blog/post/2026-05-18-devops-fractional-devops/cover.jpg" type="image/jpeg" length="0"/><media:content url="https://www.gruion.com/blog/post/2026-05-18-devops-fractional-devops/cover.jpg" medium="image" type="image/jpeg"/><media:thumbnail url="https://www.gruion.com/blog/post/2026-05-18-devops-fractional-devops/cover.jpg"/><category>DevOps</category></item><item><title>IaC Reliability in 2026: Trust, Identity, and the Hidden Failure Modes Nobody Plans For</title><link>https://www.gruion.com/blog/post/2026-05-17-infrastructure-as-code-deployment-reliability/</link><pubDate>Sun, 17 May 2026 06:01:36 +0000</pubDate><guid>https://www.gruion.com/blog/post/2026-05-17-infrastructure-as-code-deployment-reliability/</guid><description>Key Takeaways Expired machine identities in CI/CD pipelines — not bad code — are causing real production outages; audit your deployment tokens with tools like HashiCorp Vault or AWS IAM Access Analyzer. OpenTofu (the Linux Foundation fork of Terraform) is now a production-ready alternative if …</description><content:encoded><![CDATA[<h2 id="key-takeaways">Key Takeaways</h2>
<ul>
<li>Expired machine identities in CI/CD pipelines — not bad code — are causing real production outages; audit your deployment tokens with tools like HashiCorp Vault or AWS IAM Access Analyzer.</li>
<li>OpenTofu (the Linux Foundation fork of Terraform) is now a production-ready alternative if licensing is a constraint on your IaC adoption.</li>
<li>AWS CloudFormation&rsquo;s new <code>Fn::GetStackOutput</code> eliminates manual cross-account/cross-region output wiring — a significant quality-of-life improvement for multi-account CDK users.</li>
<li>Kubernetes v1.36&rsquo;s Mixed Version Proxy (now Beta) makes rolling upgrades safer by preventing 404s during control plane version skew.</li>
<li>Progressive delivery with ArgoCD + Flagger, backed by OpenTelemetry metrics, catches regressions canaries miss at the functional level.</li>
</ul>
<h2 id="tools--setup">Tools &amp; Setup</h2>
<p>IaC reliability isn&rsquo;t just about correct Terraform plans — it&rsquo;s about the full delivery chain. Start by auditing non-human identities across your pipelines: build runners, OIDC tokens, Kubernetes service accounts, and artifact-signing credentials. Tools like <code>trufflesecurity/driftwood</code>, AWS IAM Access Analyzer, or Teleport&rsquo;s machine ID can surface stale credentials before they expire on a Friday night.</p>
<p>For multi-account AWS shops, adopt <code>Fn::GetStackOutput</code> in CloudFormation/CDK to replace brittle SSM Parameter Store hand-offs between stacks. For Kubernetes clusters in rolling upgrades, enable the <code>UnknownVersionInteroperabilityProxy</code> feature gate in 1.36 — it proxies requests to the correct API server version and eliminates garbage-collection side effects during skewed control-plane upgrades. On the delivery side, pair ArgoCD with Flagger for canary rollouts and wire OpenTelemetry spans into your pipeline so a failed integration test correlates with the downstream service it actually broke.</p>
<h2 id="analysis">Analysis</h2>
<p>The through-line in recent production incidents — Discord&rsquo;s voice outage from a hidden circular dependency, Pinterest&rsquo;s CPU zombie problem on PinCompute, late-night deployment token expiries — is that the failure wasn&rsquo;t in the IaC itself. The infrastructure was declared correctly. What failed was the operational layer surrounding it: dependency maps nobody kept current, system defaults nobody audited, machine identities nobody remembered to rotate.</p>
<p>This is where IaC maturity actually lives in 2026. Writing a Terraform module is table stakes. The harder work is building the observability and governance scaffolding around it: route sync metrics in the Kubernetes CCM to validate reconciliation behavior, <code>route_controller_route_sync_total</code> counters to A/B test watch-based vs. interval-based reconciliation, and supply-chain attestations that remain trustworthy even when OIDC tokens are abused (as in the Mini Shai-Hulud CI/CD pipeline attacks).</p>
<p>The teams shipping reliably aren&rsquo;t the ones with the most sophisticated IaC — they&rsquo;re the ones treating deployment as an observability problem. Every rollout emits telemetry. Every credential has an owner and a TTL. Every cross-stack dependency is explicit, not implicit. OpenTofu, CloudFormation CDK, ArgoCD, and Kubernetes v1.36 all move in this direction. The gap is in adopting them as a system, not as isolated tools.</p>
<h2 id="sources">Sources</h2>
<ul>
<li><a href="https://devops.com/why-devops-is-critical-for-modern-business-resilience/">https://devops.com/why-devops-is-critical-for-modern-business-resilience/</a></li>
<li><a href="https://devops.com/widespread-mini-shai-hulud-campaign-is-a-matter-of-trust/">https://devops.com/widespread-mini-shai-hulud-campaign-is-a-matter-of-trust/</a></li>
<li><a href="https://devops.com/observability-driven-continuous-testing-in-cloud-native-devops/">https://devops.com/observability-driven-continuous-testing-in-cloud-native-devops/</a></li>
<li><a href="https://devops.com/your-ci-cd-pipeline-has-non-human-identities-you-forgot-about/">https://devops.com/your-ci-cd-pipeline-has-non-human-identities-you-forgot-about/</a></li>
<li><a href="https://www.infoq.com/news/2026/05/discord-circular-dependency/">https://www.infoq.com/news/2026/05/discord-circular-dependency/</a></li>
<li><a href="https://www.infoq.com/news/2026/05/pinterest-cpu-zombies-bottleneck/">https://www.infoq.com/news/2026/05/pinterest-cpu-zombies-bottleneck/</a></li>
<li><a href="https://www.infoq.com/news/2026/05/kubernetes-1-36-released/">https://www.infoq.com/news/2026/05/kubernetes-1-36-released/</a></li>
<li><a href="https://kubernetes.io/blog/2026/05/15/ccm-new-metric-route-sync-total/">https://kubernetes.io/blog/2026/05/15/ccm-new-metric-route-sync-total/</a></li>
<li><a href="https://kubernetes.io/blog/2026/05/15/kubernetes-1-36-feature-mixed-version-proxy-beta/">https://kubernetes.io/blog/2026/05/15/kubernetes-1-36-feature-mixed-version-proxy-beta/</a></li>
<li><a href="https://kubernetes.io/blog/2026/05/14/kubernetes-v1-36-deprecation-and-removal-of-service-externalips/">https://kubernetes.io/blog/2026/05/14/kubernetes-v1-36-deprecation-and-removal-of-service-externalips/</a></li>
<li><a href="https://www.env0.com/blog/opentofu-the-open-source-terraform-alternative">https://www.env0.com/blog/opentofu-the-open-source-terraform-alternative</a></li>
<li><a href="https://aws.amazon.com/blogs/devops/simplify-cross-account-and-cross-region-stack-output-references-with-aws-cloudformation-and-cdks-new-fngetstackoutput/">https://aws.amazon.com/blogs/devops/simplify-cross-account-and-cross-region-stack-output-references-with-aws-cloudformation-and-cdks-new-fngetstackoutput/</a></li>
</ul>
<hr>
<p><strong>Need help setting this up?</strong> Gruion provides hands-on DevOps services, CI/CD automation, and platform engineering. <a href="https://www.gruion.com/#contact">Get a free consultation</a></p>
]]></content:encoded><category>IaC</category></item></channel></rss>