DevOps'ish 234: Giving up on reopen dates, containers everywhere, Epic v. Apple, OWASP Top 10 changes, Kubernetes troubleshooting, Podman, and more

People Microsoft gives up predicting when its US offices will fully reopen And every other company should too. We just don’t know when they’ll reopen and constantly re-picking dates just to change them later seems silly. A Non-Tech Explanation of Containers and Kubernetes Through this simple analogy by 451 Research, get a better understanding of virtualization, containers, and Kubernetes. Learn the differences between these big topics and the role of each in a multicloud future. SPONSORED What is an SRE? “A comprehensive definition of SREs and Site Reliability Engineering, including what SREs do and what makes SREs different from other roles.” The Epic v. Apple verdict is out “The Epic v. Apple lawsuit has concluded. The verdict sees Apple come out largely unscathed — but with one of its central App store policies deemed illegal.” Meet the Little-Known Genius Who Helped Make Pixar Possible Alvy Ray Smith helped invent computer animation as we know it—then got royally shafted by Steve Jobs. Now he’s got a vision for where the pixel will take us next. ...

September 12, 2021 · 4 min · Chris Short

DevOps'ish 233: Luke Hinds of Sigstore, three REALLY bad breaches/bugs, Docker's increasing desperation, Kubernetes mTLS, update your Operators, BGP & filesystem benchmarks, and more

I spent most of the week in a deteriorated state. Getting over the 12 injections last Friday took much longer than expected. It still amazes me how much work I can do with a disability, medications that slow me down, and a lack of sleep (Max started Kindergarten this week). In a way, this is a lot like our systems, overtaxed by the increasing number of people using them. Ready to both be upgraded by an admin and taken down by a deluge of traffic at the same time (or worse, the opposite). Running along in a less than optimal state is pretty optimal for a lot of workloads. Sure, specific workloads will need certain kinds of hardware, and the software varies in those spaces. But, most of us are still using an abstraction of an abstraction of an abstraction (of an abstraction). Like a top starting to lose its grip on centrifugal force, our systems run fine until they don’t. Now, more than ever, we need to know how the systems are performing. What caused the slowdown? What sent the system sliding off the table into oblivion? Will it be able to be spinning like a top again soon? What do you do to pick it back up and having it moving like the top in Inception? All these questions are answered by the same question: How do we know if we’re doing the right thing? ...

September 5, 2021 · 7 min · Chris Short

DevOps'ish 232: seccomp's day in the Kubernetes sun, Linux at 30, burn out, Chevy Bolt bot blunder, lifelong learning, GitOps, and more

A trying week capped off by trigger point injections. Long story short, I’ve been trying to get a family out of Afghanistan for the past two weeks to no avail. I won’t bore you with info or divulge identifying details. But, the possibility for their safe passage to the US has pretty much gone to 0. It’s hard telling a 16-year-old kid that you’ve exhausted all your resources. You can only offer tidbits of info. HUGE shoutout to the team behind Ehtesab for enabling me to get SOME intel from folks on the ground. The situation itself is a failure. A failure on multiple levels. But, it’s a stark reminder that you have to experiment and sometimes try all the ways possible to get a solution into production. Can you deploy this feature as a feature flag, or do you need a canary or blue/green deployment? At what layer are you going to manage THAT? Your global load balancer? Maybe inside your application stack on a keepalived instance? Perhaps it’s better to handle this in your Kubernetes cluster by managing replica sets or ingresses. Once you get past that decision, there are many more along the way. Then it’s “go time.” Your solution is ready to handle some production traffic. ...

August 29, 2021 · 7 min · Chris Short

DevOps'ish 231: Kubernetes 1.22 release team livestream, problems in Perl, glibc, eBPF, Pod Security Admission, secure supply chains, tools galore, and more

My military service and tech worlds collided this week. I can’t say much about it yet but, I’ve been insanely busy with an array of things I never thought I’d need to do. More to come later. Join the DevOps’ish subreddit and talk about how bad the intro was. Or how dope the notes page is for this issue. People Cloud Tech Tuesdays: Kubernetes 1.22 Josh Berkus, Amy Marrich, and I sat down for a livestream with Savitha Raghunathan, James Laverack, Jesse Butler, and Guinevere Saenger to discuss all things Kubernetes and the Kubernetes 1.22 release. Free eBook: Docker Security Essentials by HackerSploit Docker is a popular platform to quickly create, deploy and host web applications, databases and other business critical solutions. Learn how to audit and secure Docker in this comprehensive guide and 9-part video series. Download instantly – no registration required. SPONSORED Samsung’s leader is out of jail, allowing US factory plans to move forward “Samsung heir served 18 months in prison for capital flight and perjury.” Can someone that’s got a great understanding of Korean business and politics please reply to this email. I have no idea how this works and I want to understand it before I label it anything. ...

August 22, 2021 · 7 min · Chris Short

DevOps'ish 230: Complex Systems == No Single Root Cause, WFHers juggling two jobs, Service Reliability Math, eBPF Foundation, Dashboards, Tools from Black Hat and more

Another week another bout of bad weather. Systems here in our home have gotten a bit more robust since our multi-day total blackout. I took a meeting this week in a house with no power. The meeting was short, but it demonstrated that if everything goes to hell in a handbasket, my systems are redundant enough to enable me to pass whatever batons when needed. But, lately, it’s felt like a lot. You can feel the cost of communication when a cacophony of UPSes suddenly fills the house. Luckily power was restored before we went to bed that night. But, what came later was something of a surprise. In 36 hours, Michigan received almost a quarter of its annual total of lightning stikes (a lot of them cloud to ground). While this didn’t seem to affect services we consume, I can only imagine the hell it played out for multiple fire responders of all stripes. One of the worse incidents I was part of was a lightning strike that hit a datacenter’s generator transfer switch. It kicked off a chaotic series of events that caused HVAC systems to go offline. The storm that night was hellacious too. A datacenter can generate enough heat to make network switches act up is a miserable series of events. There was no single root cause. Multiple systems failed or malfunctioned in unplanned or thought of ways. The fact we weren’t up and running once temperatures started to cool down unlocked a new mystery that ultimately led us to restart our core switches because the heat had thrown the ASICs out of whack. But, there was never a single root cause. You could say the lightning strike was the root cause. But, that hit systems outside the datacenter and related to power. Our systems went down because core switching had overheated. Cooling units inside the datacenter reset but didn’t start using refrigerant until they were reset again in a particular order (the cooling system was never supposed to respond the way it did). There’s never a single root cause for a large-scale outage (John Allspaw argues the point further below). ...

August 15, 2021 · 8 min · Chris Short