DevOps’ish

Cloud Native, DevOps, Open Source, AI, tech industry news, culture, and the ‘ish between. A newsletter by Chris Short.

DevOps'ish 319: An NTP server time-traveled to 2006, an OpenAI model broke into Hugging Face, and more

Subscribe to DevOps'ish Cloud Native, DevOps, Open Source, AI, tech industry news, culture, and the 'ish between. A newsletter by Chris Short. ...

DevOps'ish 318: Linus Makes Peace With AI, the Worst Memory Shortage Yet, Deere Bows to Repair, and More

When your Kubernetes fleet can’t drift, upgrades stop being a gamble. (SPONSOR) Kubernetes upgrades have a way of sliding to next quarter, because one drifted node is enough to derail a routine upgrade. Talos Linux is immutable, so every node runs from the same image and can’t drift apart. Talos Omni upgrades the OS and Kubernetes together, one cluster at a time across the fleet, so the upgrade you dreaded turns into a non-event. 93% of teams have already hit an AI-caused infrastructure incident, but only 30% have a policy. Spacelift surveyed 406 leaders on who’s actually ready for AI. Download the report to see what the most advanced teams do differently, and watch the event where we unpacked the data. (SPONSOR) Building a Custom Metrics Exporter for Kubernetes (8 minute read) A hands-on tutorial for writing your own Prometheus exporter in Go, packaging it as a container, and wiring it into a cluster so you can autoscale on real signals like queue depth instead of just CPU and memory. Counters, gauges, histograms, naming conventions, and HorizontalPodAutoscaler integration, all in one walkthrough. ...

DevOps'ish 317: Januscape Turns 16, etcd Hits 3.7, and More

With Talos, most CVEs never apply, and patching the rest won’t break the fleet. Patching a CVE means knowing whether it applies to you, then getting the fix to every node without missing one. Talos Linux ships fewer than 50 binaries, because that’s all it takes to run Kubernetes, so most CVEs never apply. Omni rolls out the ones that do, staged and health-checked, with automatic rollback if a node fails. Designing IaC Interfaces That Work for Humans, AI Agents, and Whatever Comes Next (SPONSOR) AI agents are changing who, or what, uses your Terraform modules. Join Jinger Meilani, Senior DevOps Engineer at MNTN, to learn how to design reusable, self-service IaC interfaces that reduce misuse and work for humans, AI agents, and whatever comes next. Announcing etcd v3.7.0 (11 minute read) SIG etcd ships v3.7.0 with a new RangeStream API for streaming large result sets, keys-only range and faster lease optimizations, plus removal of the legacy v2 store. The quiet workhorse under every Kubernetes cluster gets a meaningful tune-up. ...

DevOps'ish 316: ClickHouse Eats Observability, the Father of the Internet Bows Out, Podman Breaks Things, and More

Designing IaC Interfaces That Work for Humans, AI Agents, and Whatever Comes Next (SPONSOR) AI agents are changing who, or what, uses your Terraform modules. Join Jinger Meilani, Senior DevOps Engineer at MNTN, to learn how to design reusable, self-service IaC interfaces that reduce misuse and work for humans, AI agents, and whatever comes next. Kepler, re-architected: Improved power accuracy and a community call to action! (8 minute read) The CNCF’s Kubernetes power-monitoring project got a full rewrite. The new architecture drops eBPF, sheds a pile of required privileges, and adds dynamic hardware discovery so the energy numbers actually mean something across mixed fleets. The team is also asking for help validating accuracy, so if you care about sustainability metrics, consider this your invitation. Akrites: The Latest Attempt to Protect Open-Source From AI Attacks Has Arrived (7 minute read) The Linux Foundation stood up Akrites, a single coordination point for finding and fixing open source vulnerabilities before attackers get there first. Jim Zemlin’s framing is bleak and accurate: the mean time to exploit is now measured in negative days. Whether another initiative moves the needle or just adds a logo to the pile is the open question. ...

DevOps'ish 315: Sub-Nanometer Chips, Supply Chain Whiplash, and the Database Nobody Could Kill, and More

We helped build Docker. Now we’re building the engineer who maintains it. (SPONSOR) Sam was Docker’s first hire. Andrea wrote Docker’s first commit. We spent a decade watching teams drown in CI maintenance. Mendral is what we wished we’d had. Three agents in your CI: Security reviews dep PRs, Reliability fixes flaky tests, Performance cuts pipeline time. Designing IaC Interfaces That Work for Humans, AI Agents, and Whatever Comes Next (SPONSOR) AI agents are changing who, or what, uses your Terraform modules. Join Jinger Meilani, Senior DevOps Engineer at MNTN, to learn how to design reusable, self-service IaC interfaces that reduce misuse and work for humans, AI agents, and whatever comes next. Klue Supply Chain Incident and LastPass Response (4 minute read) An unauthorized actor snagged OAuth tokens from Klue, a market intelligence platform, and used them to access LastPass customer contact and CRM data stored in Salesforce. LastPass says vaults and core infrastructure are unaffected, but this is another clean example of why your vendor’s vendor is still your problem. ...