DevOps'ish 232: seccomp's day in the Kubernetes sun, Linux at 30, burn out, Chevy Bolt bot blunder, lifelong learning, GitOps, and more

A trying week capped off by trigger point injections. Long story short, I’ve been trying to get a family out of Afghanistan for the past two weeks to no avail. I won’t bore you with info or divulge identifying details. But, the possibility for their safe passage to the US has pretty much gone to 0. It’s hard telling a 16-year-old kid that you’ve exhausted all your resources. You can only offer tidbits of info. HUGE shoutout to the team behind Ehtesab for enabling me to get SOME intel from folks on the ground. The situation itself is a failure. A failure on multiple levels. But, it’s a stark reminder that you have to experiment and sometimes try all the ways possible to get a solution into production. Can you deploy this feature as a feature flag, or do you need a canary or blue/green deployment? At what layer are you going to manage THAT? Your global load balancer? Maybe inside your application stack on a keepalived instance? Perhaps it’s better to handle this in your Kubernetes cluster by managing replica sets or ingresses. Once you get past that decision, there are many more along the way. Then it’s “go time.” Your solution is ready to handle some production traffic. ...

August 29, 2021 · 7 min · Chris Short

DevOps'ish 231: Kubernetes 1.22 release team livestream, problems in Perl, glibc, eBPF, Pod Security Admission, secure supply chains, tools galore, and more

My military service and tech worlds collided this week. I can’t say much about it yet but, I’ve been insanely busy with an array of things I never thought I’d need to do. More to come later. Join the DevOps’ish subreddit and talk about how bad the intro was. Or how dope the notes page is for this issue. People Cloud Tech Tuesdays: Kubernetes 1.22 Josh Berkus, Amy Marrich, and I sat down for a livestream with Savitha Raghunathan, James Laverack, Jesse Butler, and Guinevere Saenger to discuss all things Kubernetes and the Kubernetes 1.22 release. Free eBook: Docker Security Essentials by HackerSploit Docker is a popular platform to quickly create, deploy and host web applications, databases and other business critical solutions. Learn how to audit and secure Docker in this comprehensive guide and 9-part video series. Download instantly – no registration required. SPONSORED Samsung’s leader is out of jail, allowing US factory plans to move forward “Samsung heir served 18 months in prison for capital flight and perjury.” Can someone that’s got a great understanding of Korean business and politics please reply to this email. I have no idea how this works and I want to understand it before I label it anything. ...

August 22, 2021 · 7 min · Chris Short

DevOps'ish 230: Complex Systems == No Single Root Cause, WFHers juggling two jobs, Service Reliability Math, eBPF Foundation, Dashboards, Tools from Black Hat and more

Another week another bout of bad weather. Systems here in our home have gotten a bit more robust since our multi-day total blackout. I took a meeting this week in a house with no power. The meeting was short, but it demonstrated that if everything goes to hell in a handbasket, my systems are redundant enough to enable me to pass whatever batons when needed. But, lately, it’s felt like a lot. You can feel the cost of communication when a cacophony of UPSes suddenly fills the house. Luckily power was restored before we went to bed that night. But, what came later was something of a surprise. In 36 hours, Michigan received almost a quarter of its annual total of lightning stikes (a lot of them cloud to ground). While this didn’t seem to affect services we consume, I can only imagine the hell it played out for multiple fire responders of all stripes. One of the worse incidents I was part of was a lightning strike that hit a datacenter’s generator transfer switch. It kicked off a chaotic series of events that caused HVAC systems to go offline. The storm that night was hellacious too. A datacenter can generate enough heat to make network switches act up is a miserable series of events. There was no single root cause. Multiple systems failed or malfunctioned in unplanned or thought of ways. The fact we weren’t up and running once temperatures started to cool down unlocked a new mystery that ultimately led us to restart our core switches because the heat had thrown the ASICs out of whack. But, there was never a single root cause. You could say the lightning strike was the root cause. But, that hit systems outside the datacenter and related to power. Our systems went down because core switching had overheated. Cooling units inside the datacenter reset but didn’t start using refrigerant until they were reset again in a particular order (the cooling system was never supposed to respond the way it did). There’s never a single root cause for a large-scale outage (John Allspaw argues the point further below). ...

August 15, 2021 · 8 min · Chris Short

DevOps'ish 229: Kubernetes 1.22, KubeCon schedule announced, security fails abound, Zoom's paltry fine, finally death to 996, NSA Kubernetes Hardening Guidance, and much more

Kubernetes 1.22 shipped this week. I suggest you, at a minimum, read the release blog post or take a gander at the CHANGELOG and definitely read the No, really, you MUST read this before you upgrade. Some of the bigger changes: Audit log files are created with mode 0600 (owner read-only) Rootless mode containers moving to alpha: In my opinion, if you use Podman, you’re used to this. If you’re not, you should be using rootless containers intentionally for security reasons (more on that later). Cgroupsv2 moving to alpha Pod Security Policy replacement (aka Pod Security Admission Controller): Yes, PSPs are deprecated and being replaced. There are a lot of reasons why. LoadBalancer moving to beta Enable seccomp by default and a whole bunch more KubeCon NA 2021 acceptances went out this week and the schedule is live. I’m excited to say I’m teaming up with Kaslin Fields, Bart Farrell, Matthew Broberg, and Kunal Kushwaha for a panel talk about what we’ve been doing in the Kubernetes Upstream Marketing Team (which includes the @K8sContributors Twitter handle and so much more). ...

August 8, 2021 · 6 min · Chris Short

DevOps'ish 228: Natural disasters, GitOps with Codefresh, NSO Group, MeteorExpress, Linkerd, Kubernetes 1.22, TSMC’s 2nm chips, cloud outposts, and more

At 8:13 PM last Saturday, the family and I were gathered in our basement, evading a tornado warning that came through the area. The storm spawned three tornadoes. Luckily, we weren’t hit directly. But we lost power, internet, and cell service. After getting the all-clear and assessing the situation, it was clear that we would be without power for quite a few hours. Making a newsletter last week wasn’t happening. It was technically impossible, and to be honest, I had a big ole stack of higher priorities come in. Then a few hours turned into a few days without these services. Luckily, we have a gas stove and water heater. I spent Monday morning frantically trying to find a place with the trifecta of power, internet, and cell service. It didn’t exist within a twenty-minute radius of our house. We spent over 44 hours without power. We were lucky we didn’t have to wait much longer than that. The roof that I thought was damaged wasn’t (the shingles in our yard weren’t ours 😬😬😬). Cell service came back up in the morning on Tuesday. ...

August 1, 2021 · 7 min · Chris Short