Slight Reliability

Slight Reliability

Learning SRE, one day at a time.

Episodes

September 9, 2025 β€’ 28 mins

Send us a text

As an #SRE how do you influence senior leadership to get support and priority for the things you care about?

To answer this question I'm joined by Nora Jones, founder of Jeli and now Head of Pricing, Product Strategy and Growth at PagerDuty. Our conversation touches on...

🀝 How understanding needs to flow both ways (between engineers and leaders)
🎨 Reliability is as much an art as a science
πŸ“ Using napki...

Mark as Played

Send us a text

This week I do a retrospective on the Slight Reliability podcast.

πŸ‘‚ How many people listen to it?
❀️ How do I feel about the show?
πŸŽ‰ What's going well?
πŸͺ΄ What could be better?
❔ What's next for the show?

If you want to check out the podcast that came before Slight Reliability, you can find Performance Time archived on YouTube here:
https://www.youtube.com/@performance-time

You can find St...

Mark as Played
August 12, 2025 β€’ 38 mins

Send us a text

Have you burned out at work? What was your experience? How did you work through it?

This week I'm joined by the incredible Colette Alexander to discuss what burnout is, what it means, and we both share our personal experiences burning out at work. We cover...

πŸ”₯ What is burnout?
❓ Why does it happen?
πŸ«€ What are the symptoms?
πŸ₯Š Fight, flight, or freeze
πŸ§‘β€πŸš’ Advice on how to recover

...and much more...

Mark as Played

Send us a text

This week I'm joined by the wonderful Hanson Ho to discuss the unique challenges and opportunities in making our mobile apps observable! We cover...

πŸ“± The mobile/backend observability divide
✍️ The challenge of distributed tracing on mobile apps
🌏 The entire device runtime environment matters for your app
πŸ‘€ The quest for user-centric mobile observability
βœ… Advice on how to get started with mobil...

Mark as Played

Send us a text

This week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover...

πŸ‹οΈβ€β™€οΈ Reliability VS Robustness VS Resilience
🧩 What is a complex system?
πŸ”’ Safety one/safety two
🧠 Mental models
😩 Human error

...and so much more.

Resources from this episode:

Four concepts for resilience (paper) by Dr. David Woods https://www.rese...

Mark as Played
June 24, 2025 β€’ 48 mins

Send us a text

This week on the 100th episode I'm joined by DevOps and Resilience Engineering legend John Allspaw to talk about learning (especially from incidents). We discuss...

πŸ“’ Classroom VS situated learning
🀝 The myth of the perfect handover
ITIL as a coping strategy to try and make sense of the organic, wild, and messy
πŸ₯• How you cannot incentivise to avoid incidents (it doesn't work that way)
❀️‍�...

Mark as Played

Send us a text

This week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...

πŸ”Œ Empowering engineers to implement change in your org
πŸ§‘β€πŸΌ Focusing on what matters (customer & business > technology)
πŸ‘€ Not jus...

Mark as Played

Send us a text

This week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...

🌌 Is the root cause of every incident the big bang?
πŸ¦– How the value of root cause degrades as complexity increases
🫣 That if the culture is not blameless, people will hide things
🌳 Alternative approaches to root ca...

Mark as Played

Send us a text

This week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...

πŸ€– What is synthetic monitoring?
🦾 What are the benefits and drawbacks to using it?
☒️ Non-web based synthetics (the tough stuff)
🍹 Combining RUM and synthetics
🫒 Does synthetics need an OTEL-like framework?

...and much more.

You can find David on:

LinkedIn: https://www.linkedin.com/in/david-dick...

Mark as Played
April 23, 2025 β€’ 31 mins

Send us a text

This week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...

βœ–οΈ Theory X vs Theory Y management
πŸ—£οΈ Intention based leadership and communication
🏒 Conditions in an org for people to thrive
πŸ˜΅β€πŸ’« How do you learn to manage and lead?
🫀 Managing people when you're not an expert in what they do

...and much more.

Resou...

Mark as Played

Send us a text

This week Leon Adato and I break down the state of applying for roles in tech. We cover...

πŸ“ What a resume or CV is and is not
🀝 Leveraging your connections rather than relying on applying cold
πŸͺ„ How most job descriptions are works of fiction
🦾 White-fonting to game AI resume assessment
πŸ§ͺ Experimental ways we could recruit

...and our pitch for Kubernetes the Rock Opera (and much more)

You can find Le...

Mark as Played

Send us a text

This week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...

πŸͺ– War stories and examples of production incidents
🩹 The "hacks" we build to keep things running (and how maybe that's just normal)
😎 Keeping it simple... YAGNI (You Ain't Gonna Need It!)
🧯 The perils of getting stuck in reactive ...

Mark as Played

Send us a text

This week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...

🀷🏽 Why move into leadership?
πŸ‘οΈ Learning from other leaders
πŸ’Ž What is unique about SRE leadership?
πŸ‘‘ Women in engineering leadership

...and we go through some feedback I got as a leader recently.

Resource...

Mark as Played

Send us a text

This week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as...

πŸ’Έ Does your org treat observability as a cost centre or a value add?
πŸ”₯ Are you using observability reactively to solve problems? Or proactively to build better products and services?
πŸ‘€ Is your observability connected to your users and business in a meaningful way?
🌐 Is mon...

Mark as Played
January 21, 2025 β€’ 15 mins

Send us a text

In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover:

🌊 The new wave of mega-complex SaaS
βš—οΈ Challenges integrating SaaS with our observability pipelines
πŸ‘©β€πŸ¦― How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry
πŸ’° Paying twice to ingest, store, and search telemetry
πŸ“ˆ Monitoring and predicting SaaS obs...

Mark as Played

Send us a text

This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management.

You can find the official Slight Reliability podcast website at: https://slightreliability.com/

You can find Stephen at:

LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube...

Mark as Played

Send us a text

This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:

πŸ¦… The recent Crowdstrike outage and their public post-mortem
πŸš‘ When do we do a blameless post-mortem?
πŸ˜• How do we do a blameless post-mortem?
βœ… How do we make sure action items are followed through?
πŸ“° The power of learning from post-mortems created by other tea...

Mark as Played

Send us a text

This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover:

🌩️ Achieving observability in a SaaS world
πŸ₯« Context propagation - the magic sauce of OTEL
πŸšͺ The telemetry gateway concept and leveraging the OTEL collector
πŸͺ΅ The state of OpenTelemetry logging
πŸ«‚ Making use of the OpenTelemetry community

...and much ...

Mark as Played

Send us a text

In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.

We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relations...

Mark as Played

Send us a text

In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.

Books mentioned in the episode:

The...

Mark as Played

Popular Podcasts

    The latest news in 4 minutes updated every hour, every day.

    The Clay Travis and Buck Sexton Show

    The Clay Travis and Buck Sexton Show. Clay Travis and Buck Sexton tackle the biggest stories in news, politics and current events with intelligence and humor. From the border crisis, to the madness of cancel culture and far-left missteps, Clay and Buck guide listeners through the latest headlines and hot topics with fun and entertaining conversations and opinions.

    The Charlie Kirk Show

    Charlie is America's hardest working grassroots activist who has your inside scoop on the biggest news of the day and what's really going on behind the headlines. The founder of Turning Point USA and one of social media's most engaged personalities, Charlie is on the front lines of America’s culture war, mobilizing hundreds of thousands of students on over 3,500 college and high school campuses across the country, bringing you your daily dose of clarity in a sea of chaos all from his signature no-holds-barred, unapologetically conservative, freedom-loving point of view. You can also watch Charlie Kirk on Salem News Channel

    The Megyn Kelly Show

    The Megyn Kelly Show is your home for open, honest and provocative conversations with the most interesting and important political, legal and cultural figures today. No BS. No agenda. And no fear.

    The Bobby Bones Show

    Listen to 'The Bobby Bones Show' by downloading the daily full replay.

Advertise With Us
Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

Β© 2025 iHeartMedia, Inc.