Redirecting Blame into Solutions
Travel Disasters and Finger-pointing
Twenty years ago on a trip through Europe, my brother and I made some dumb mistakes on international trains. He ended up in Warsaw, with our train tickets, and I ended up in Krakow, with all of our bags.
I feel like everybody has stories like this. You go on some adventure with friends and/or family, and somebody makes a mistake, or somebody miscommunicates. Money is wasted. Precious objects are lost. Lifelong friendships shatter. I mean, it’s not always that intense, but it is surprising how often fun goes wrong at unexpectedly high cost. The things we try to do together can drive us apart.
You’ll be pleased to know my brother and I remain brothers, and friends. Nobody had to be disowned. Nobody stopped talking for a decade. It’s because the damage was limited that I feel comfortable sharing this — but also, the damage remained limited because we accidentally made good decisions in how we approached the disaster when it happened.
After we got separated, I was sitting alone on the train, somewhere between Berlin and Krakow, with all of our baggage, when the ticket collector arrived, speaking Polish. He gathered up more and more people till the crowd of us were able to cobble together some common communication out of their weak English and my weak German and Spanish; they gave me a train ticket IOU and instructions on how to drop the payment in the mail. I got off the train in Krakow, wearing a backpack on my front and another on my back and dragging a giant roller bag in each hand. The hotel room I found that night cost me six times as much as I’d ever spent on a hotel before, but they spoke English and that was worth it to me. (They were also able to help me figure out how to pay the train ticket IOU. Expensive hotels turn out to have concierges.) Neither my brother nor I had mobile phones, but we both independently hit on the idea of making international calls to our father, 10 time zones away, and we communicated by leaving messages with him as if he were a dead drop.
My brother and I both made mistakes. We both lost a lot of hard-earned money fixing the problem. But to this day I think the best decision either of us made was that we immediately turned to fixing the problem. Many people, and I have to admit I have done this too, jump to blaming somebody instead.
Technology Disasters and Finger-pointing
A decade later, I was at Google, and an entirely different kind of disaster occurred. Without revealing any trade secrets, Google at the time had (are you ready for this) a lot of computers. (One assumes it has even more now. I left in 2021.) Those computers are distributed across many timezones, many continents. Computers, as you will know from your own life, are garbage. They crash all the time, and even when they’re not crashing, they’re downloading updates and insisting on being restarted right when you want to use them for something else.
Google has so many time more computers than humans that it can’t afford to have humans involved every time one of them crashes, or needs to reboot. Fixing crashes and rebooting are all done automatically, with software, mostly written at Google. My job at Google (along with thousands of other people) was, to a first approximation, writing and running the “management software” that made it possible to run a bajillion computers while they were constantly failing and getting software updates and rebooting and otherwise acting like a boiling ocean of chaos.
That day, one of these pieces of management software reached out to a really large fraction of Google’s computers and boom, it broke them. All across the company, systems that had relied on those computers stopped working. On the internal websites that told us what was going on with our systems, the charts and graphs disappeared, to be replaced with icons of broken images. Skytel pagers, still the rage at the time, began shrieking all over the company.
After a coworker finally convinced me this was more important than the conversation I’d been having about regulatory compliance, I looked at the situation and the evidence, and then I walked all over the building finding people from different teams who might be able to help. I dragged them all back to a small office where we locked ourselves in and tried to figure out what was going on. As we figured out bits and pieces of what was happening, we called the new information out out or wrote notes on a shared whiteboard. Somebody identified the software that had gone wrong; somebody else figured out why it had done what it did. We crafted a solution – we used the very system that had broken everything to unbreak those computers.
Hours later, the emergency was over, though some of the repair work continued for days.
We started collecting information about what had happened, how it had happened, and how our safety checks had failed, but the system was complex, the full list of things that had gone wrong was huge, and it was taking a while.
Word began to get back to me that executives in far-away teams were furious. Why did this happen? What team did it? What person had made the mistake?
They were looking for someone to blame.
I worked with my teammates and built a list of all the people we needed to get all the information we needed, and we called a big meeting, and we discussed:
What are all the details of what went wrong to cause the problem?
What went wrong and when?
Once the root problem started, what are all the subsidiary problems that it caused?
How did we fix it?
What were we going to do to keep it from ever happening again?
And two critical questions:
Where did we get lucky?
What went well?
Why did we ask where we got lucky? As terrible as the incident was, it could have been worse. Some accidents had saved us from a worse outage; we might not be so lucky next time. Asking that question helped us think about solutions that would prevent not only this exact problem, but a broader set of similar but meaningfully different problems.
Why did we ask what went well? Because we could think adding those good behaviors or characteristics to other systems. This can also be a moment to recognize the teammates who came together in a crisis, and to inspire others to rise to the challenge next time. Highlighting positivity and hope keeps us motivated through painful experiences.
We channeled the notes from this giant meeting into a retrospective document. (Within Google, the jargon for these retrospectives is “postmortem.”) We shared the document widely, and I made sure it made its way back to those angry executives I’d been warned about.
The retrospective document transformed fear and frustration into solutions. It became a roadmap for several software teams to implement deep and meaningful changes that would prevent anything like this from happening again.
And it let those executives in far-away teams know that the issue was being taken seriously and was being addressed in real and systemic ways. While I cannot know what was in their hearts, I suspect they were able to get more comfort from that than they would have by finding someone to blame.
Learn instead of blaming
When something big goes wrong, we get emotional. We may get angry or frustrated, or sad. Energy surges through us, looking for an outlet.
If the emotion is anger or frustration, we often point it at another person. Sometimes we may point it at ourselves; for some people, this is an even more destructive habit. Both are forms of blame.
It’s important to remember: the emotion comes first. Where we point that emotion – that’s a decision we make. We may have learned to point it at a person, but that’s usually not the right choice.
Emotions are a gift. They evolved to protect us; in the wake of a disaster (whether it’s at work or in our personal lives), they surge and give us energy to fight the disaster: to get us out of it, and to prevent it from happening again. Emotions are powerful and motivating. Instead of hurting our friends, or ourselves, we can choose to channel the emotions into fighting the problem:
How do we solve the problem we’re in?
What can we learn about ourselves and the world from the problem that just happened?
What can we do to prevent this, or something like this, from happening again?
What can we do so that it’s easier to recover from problems next time?
None of this is to say that people are never to blame. Sometimes people do cruel things. Sometimes it’s not our job to fix them. It’s important to distinguish these from cases where a well-meaning person makes a mistake.
But when something bad happens despite the efforts of well-meaning people, you can be allies together in overcoming the problem and learning from it.
This change in mindset has two benefits: we’re more connected and happier, and we’re more prepared for the next challenge.