Slight Reliability

Slight Reliability

Learning SRE, one day at a time.

Episodes

July 15, 2025 β€’ 39 mins

Send us a text

This week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover...

πŸ‹οΈβ€β™€οΈ Reliability VS Robustness VS Resilience
🧩 What is a complex system?
πŸ”’ Safety one/safety two
🧠 Mental models
😩 Human error

...and so much more.

Resources from this episode:

Four concepts for resilience (paper) by Dr. David Woods https://www.rese...

Mark as Played

Send us a text

This week on the 100th episode I'm joined by DevOps and Resilience Engineering legend John Allspaw to talk about learning (especially from incidents). We discuss...

πŸ“’ Classroom VS situated learning
🀝 The myth of the perfect handover
ITIL as a coping strategy to try and make sense of the organic, wild, and messy
πŸ₯• How you cannot incentivise to avoid incidents (it doesn't work that way)
❀️‍�...

Mark as Played

Send us a text

This week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...

πŸ”Œ Empowering engineers to implement change in your org
πŸ§‘β€πŸΌ Focusing on what matters (customer & business > technology)
πŸ‘€ Not jus...

Mark as Played

Send us a text

This week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...

🌌 Is the root cause of every incident the big bang?
πŸ¦– How the value of root cause degrades as complexity increases
🫣 That if the culture is not blameless, people will hide things
🌳 Alternative approaches to root ca...

Mark as Played

Send us a text

This week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...

πŸ€– What is synthetic monitoring?
🦾 What are the benefits and drawbacks to using it?
☒️ Non-web based synthetics (the tough stuff)
🍹 Combining RUM and synthetics
🫒 Does synthetics need an OTEL-like framework?

...and much more.

You can find David on:

LinkedIn: https://www.linkedin.com/in/david-dick...

Mark as Played
April 23, 2025 β€’ 31 mins

Send us a text

This week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...

βœ–οΈ Theory X vs Theory Y management
πŸ—£οΈ Intention based leadership and communication
🏒 Conditions in an org for people to thrive
πŸ˜΅β€πŸ’« How do you learn to manage and lead?
🫀 Managing people when you're not an expert in what they do

...and much more.

Resou...

Mark as Played

Send us a text

This week Leon Adato and I break down the state of applying for roles in tech. We cover...

πŸ“ What a resume or CV is and is not
🀝 Leveraging your connections rather than relying on applying cold
πŸͺ„ How most job descriptions are works of fiction
🦾 White-fonting to game AI resume assessment
πŸ§ͺ Experimental ways we could recruit

...and our pitch for Kubernetes the Rock Opera (and much more)

You can find Le...

Mark as Played

Send us a text

This week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...

πŸͺ– War stories and examples of production incidents
🩹 The "hacks" we build to keep things running (and how maybe that's just normal)
😎 Keeping it simple... YAGNI (You Ain't Gonna Need It!)
🧯 The perils of getting stuck in reactive ...

Mark as Played

Send us a text

This week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...

🀷🏽 Why move into leadership?
πŸ‘οΈ Learning from other leaders
πŸ’Ž What is unique about SRE leadership?
πŸ‘‘ Women in engineering leadership

...and we go through some feedback I got as a leader recently.

Resource...

Mark as Played

Send us a text

This week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as...

πŸ’Έ Does your org treat observability as a cost centre or a value add?
πŸ”₯ Are you using observability reactively to solve problems? Or proactively to build better products and services?
πŸ‘€ Is your observability connected to your users and business in a meaningful way?
🌐 Is mon...

Mark as Played
January 21, 2025 β€’ 15 mins

Send us a text

In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover:

🌊 The new wave of mega-complex SaaS
βš—οΈ Challenges integrating SaaS with our observability pipelines
πŸ‘©β€πŸ¦― How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry
πŸ’° Paying twice to ingest, store, and search telemetry
πŸ“ˆ Monitoring and predicting SaaS obs...

Mark as Played

Send us a text

This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management.

You can find the official Slight Reliability podcast website at: https://slightreliability.com/

You can find Stephen at:

LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube...

Mark as Played

Send us a text

This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:

πŸ¦… The recent Crowdstrike outage and their public post-mortem
πŸš‘ When do we do a blameless post-mortem?
πŸ˜• How do we do a blameless post-mortem?
βœ… How do we make sure action items are followed through?
πŸ“° The power of learning from post-mortems created by other tea...

Mark as Played

Send us a text

This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover:

🌩️ Achieving observability in a SaaS world
πŸ₯« Context propagation - the magic sauce of OTEL
πŸšͺ The telemetry gateway concept and leveraging the OTEL collector
πŸͺ΅ The state of OpenTelemetry logging
πŸ«‚ Making use of the OpenTelemetry community

...and much ...

Mark as Played

Send us a text

In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.

We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relations...

Mark as Played

Send us a text

In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.

Books mentioned in the episode:

The...

Mark as Played

Send us a text

This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.

You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshend

You can find Stephen at:

LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_...

Mark as Played

Send us a text

This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.

You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/

You can find the official Slight Reliability podcast website at: https://slightreliability.com/

You can find Stephen at:

LinkedIn: https://www.linkedin.com/in/s...

Mark as Played

Send us a text

This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.

You can find the...

Mark as Played

Send us a text

This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.

You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/a...

Mark as Played

Popular Podcasts

    If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.

    24/7 News: The Latest

    The latest news in 4 minutes updated every hour, every day.

    Dateline NBC

    Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com

    The Breakfast Club

    The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!

    On Purpose with Jay Shetty

    I’m Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and I’m so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood you’re able to deal with relationship struggles, work challenges and life’s ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them we’ve never seen before. New episodes every Monday and Friday. Your support means the world to me and I don’t take it for granted β€” click the follow button and leave a review to help us spread the love with On Purpose. I can’t wait for you to listen to your first or 500th episode!

Advertise With Us
Music, radio and podcasts, all free. Listen online or download the iHeart App.

Connect

Β© 2025 iHeartMedia, Inc.