Listen, I’m going to tell you something that’ll either get me kicked out of the next SREcon or crowned its philosopher king: W. Edwards Deming figured out how to run platform teams back in the 1950s, and we’ve been actively ignoring him for seventy years while reinventing his wheels with YAML.
I stumbled onto Deming the same way I stumble onto most profound truths: at 2 AM during an incident, googling desperately for “why does everything break all the time forever.” Between Stack Overflow’s usual “have you tried turning it off and on again” and some Medium post about how Kubernetes solved everything (spoiler: it didn’t), I found this dead statistician who rebuilt Japan’s economy with what amounts to common sense so radical that we still can’t implement it.
Here’s the thing that’ll twist your neurons: Deming wasn’t even thinking about computers. The man was worried about car parts and assembly lines. But his 14 points read like a manifesto for fixing every dysfunctional engineering org I’ve ever worked in, consulted for, or rage quit on a Friday afternoon.
1. Create Constancy of Purpose (Or: Stop Chasing Shiny Objects, You Absolute Raccoons)#
Deming says create a constant purpose toward improvement. You know what we do instead? We rewrite our entire infrastructure every 18 months because someone went to re:Invent and got excited about serverless. Or containers. Or serverless containers. Or whatever Google deprecated last Tuesday.
Your platform’s purpose isn’t to use the latest tech. It’s not to have the most impressive resume driven architecture. It’s to make your developers’ lives not suck. That’s it. That’s the tweet.
I once worked at a place that migrated from VMs to containers to serverless and back to VMs in three years. You know what stayed constant? The deployments still took 45 minutes and failed 30% of the time. But hey, at least the architects got to speak at conferences.
2. Adopt the New Philosophy (Translation: Your Old Ways Are Dead, Deal With It)#
This one’s rich. Deming’s telling us to embrace quality throughout the organization. Meanwhile, we’re over here treating production like a leper colony that only the on call engineer has to visit.
The new philosophy isn’t microservices. It isn’t GitOps. It isn’t whatever HashiCorp is selling this week. The new philosophy is this: Everyone owns reliability. Not just SREs. Not just the platform team. Everyone. Yes, even that PM who thinks HTTP is a drug.
3. Cease Dependence on Inspection (Or: Monitoring Everything Doesn’t Mean You’re Doing Anything)#
Oh, this one hurts. We’ve got more dashboards than a Tesla factory, and what do we do with them? We stare at them AFTER things break. We’re like security guards watching footage of a bank robbery that already happened.
Deming knew what we refuse to admit: inspections don’t improve quality. You can’t observe your way to excellence. You know what those 10,000 Datadog metrics are telling you? That you built a broken system and now you’re really good at watching it be broken.
Build quality in from the start. Make broken builds impossible to deploy. Make bad configurations impossible to write. Stop trying to catch problems and start making them impossible. Revolutionary, I know.
4. End the Practice of Awarding Business on Price Tag Alone (Pick Your Cloud Vendor Like You Pick Your Tattoo Artist)#
“But AWS is expensive!” Yeah, and so is divorce, but sometimes you need to pay for quality.
You know what’s more expensive than AWS? The three engineers you’ll need to hire to manage your “cost effective” bare metal Kubernetes cluster that you’re running because some blog post said it would save you 40%. Spoiler: it won’t. You’ll spend those savings on therapy for your on call team.
5. Improve Constantly and Forever (Kaizen, But Make It YAML)#
Here’s where Deming gets spicy. He’s not talking about big rewrites. He’s talking about continuous, small improvements. You know, like we pretend to do with our “20% time” that actually gets spent in meetings about why the sprint is behind.
Real talk: when was the last time you improved something just because it sucked, not because it was on fire? When did you last refactor without a Jira ticket? When did you last delete code instead of adding another abstraction layer?
The Japanese have a word for this: Kaizen. We have a word for it too: “technical debt,” which we treat like student loans. Ignore it and hope it goes away.
6. Institute Training on the Job (Your Documentation Sucks and You Know It)#
“Check the wiki,” we say, knowing full well the wiki hasn’t been updated since that guy Steve left in 2019. “It’s in the README,” we lie, as if the README contains anything more than a broken build badge and installation instructions for a version that was deprecated before COVID.
Deming understood something we don’t: training isn’t a one week onboarding where you throw AWS docs at the new hire and hope they figure it out. It’s continuous. It’s paired programming. It’s actually explaining why we do things instead of just saying “that’s how it’s always been done.”
7. Institute Leadership (Be a Multiplier, Not a Gatekeeper)#
Your senior engineers aren’t senior because they can write the gnarliest Bash scripts. They’re senior because they should be making everyone else better. But what do we do? We make them code review gatekeepers who reject PRs for having 81 character lines.
Leadership in platform engineering isn’t about being the smartest person in the room. It’s about making the room smarter. It’s about building systems so good that junior engineers can’t mess them up. It’s about making the right thing the easy thing.
8. Drive Out Fear (Stop Making People Afraid to Deploy on Fridays)#
You want to know if your organization has fear? Ask yourself: do people deploy on Fridays? No? Congratulations, you’ve built a culture of fear.
Fear is why we have change advisory boards. Fear is why we have deployment windows. Fear is why that critical security patch has been “waiting for the right time” for six months. Fear is why your best engineers are updating their LinkedIn profiles.
Deming knew that fear makes people hide problems. And hidden problems are like hidden dependencies. They only surface during outages, usually at 3 AM.
9. Break Down Barriers Between Departments (Conway’s Law Is Not a Suggestion)#
Your platform team doesn’t talk to the app teams. The app teams don’t talk to security. Security doesn’t talk to anyone because they’re too busy saying “no” to everything.
You know why your microservices architecture looks like your org chart? Because Conway’s Law isn’t a suggestion. It’s a mathematical certainty. You’ll ship your org chart whether you want to or not.
Deming would look at our “DevOps transformations” and laugh. We didn’t break down silos; we just renamed them. Now instead of “Ops” we have “Platform Engineering,” and instead of cooperation, we have Slack channels where people passively aggressively emoji react to each other’s messages.
10. Eliminate Slogans and Exhortations (Your “Move Fast and Break Things” Poster Is Embarrassing)#
“We’re customer obsessed!” Really? Then why does your API return 500 errors for valid requests?
“Quality is Job One!” Is it though? Is it really? Because your test coverage is 23% and your main branch has been red for a week.
Slogans are what managers use when they don’t know how to actually improve things. They’re the “thoughts and prayers” of engineering management. You want to motivate your team? Give them good tools, clear requirements, and get out of their way.
11. Eliminate Numerical Quotas (Your Velocity Metrics Are Meaningless)#
Story points are astrology for engineers. There, I said it.
You know what happens when you measure deployment frequency? People deploy empty commits. You measure lines of code? You get Java. You measure incident count? People stop reporting incidents.
Deming understood what every engineer knows in their bones: when you make metrics the goal, people will game the metrics. It’s not malicious; it’s human nature. You want 99.999% uptime? Cool, we’ll just change the definition of “downtime.”
12. Remove Barriers to Pride of Workmanship (Let People Build Things They’re Not Ashamed Of)#
You know what kills an engineer’s soul? Building something you know is garbage because “we need to ship this quarter.” Implementing a half assed solution because “we’ll fix it later” (narrator: they won’t fix it later).
Every time you make engineers ship something they’re not proud of, a little part of them dies. Eventually, you’re left with a team of zombies who’ve given up on quality because what’s the point? It’s all going to be rewritten next year anyway.
13. Institute a Vigorous Program of Education (Your Team Should Be Learning, Not Just Burning)#
“We don’t have time for learning; we’re too busy fighting fires.” Yeah, you know why you’re fighting fires? Because you didn’t have time to learn how to prevent them.
The biggest lie in tech is that experience equals learning. You can have ten years of experience or one year of experience ten times. If your team isn’t learning, they’re just getting better at doing things wrong.
14. Make Transformation Everyone’s Job (Yes, Even That Contractor Who’s Been “Temporary” for Three Years)#
Here’s the uncomfortable truth: transformation isn’t a project. It’s not a sprint. It’s not something you hire McKinsey to do for you. It’s what happens when everyone decides that mediocrity is more painful than change.
You want to transform your platform? Stop waiting for permission. Stop waiting for the “right time.” Stop waiting for executive buy in. Start fixing things. Small things. Stupid things. That script that everyone copies and modifies slightly? Make it a tool. That process that takes five manual steps? Automate one of them.
The Point Where I’m Supposed to Wrap This Up With Something Profound#
Look, Deming figured this out when computers were the size of buildings and programmed with punch cards. We’ve got Kubernetes, Terraform, and GPT 4, and we’re still shipping broken software and burning out engineers.
The problem isn’t the technology. It never was. The problem is that we keep trying to solve human problems with technical solutions. We keep thinking that the next framework, the next tool, the next methodology will save us.
It won’t.
What will save us is admitting that Deming was right: quality isn’t something you add at the end. It’s not something you monitor. It’s not something you sprint toward. It’s something you build into the system from the very beginning, one small improvement at a time.
Your platform team doesn’t need another monitoring tool. It doesn’t need another framework. It doesn’t need another reorg. It needs to embrace these 14 points that a statistician figured out before your parents were born.
But hey, what do I know? I’m just another engineer who’s spent too much time thinking about this stuff at 2 AM during an outage that could have been prevented if we’d just listened to a dead guy from the 1950s.
Now if you’ll excuse me, I need to go update our runbooks that nobody reads and pretend that this time will be different.
P.S. If your response to this is “but we’re different” or “you don’t understand our constraints,” congratulations. You’re part of the problem. Deming’s points aren’t suggestions; they’re laws of organizational physics. You can ignore them, but like gravity, they’ll still apply whether you believe in them or not.
P.P.S. Yes, I know this post is too long. Steve Yegge would be proud. DHH would tell me to delete half of it. I’m keeping it all because sometimes the truth needs more than a tweet thread.