Learning SRE, one day at a time.
What is chaos engineering and how is it being used in 2025?
This week I'm joined by Gremlin CEO and founder Kolton Andrus to discuss...
πͺοΈ What is chaos engineering and what is its origins?
πͺ΄ How has it evolved over the year?
π€ The role of AI agents in SRE work
π° Justifying the value of chaos engineering
πββοΈββ‘οΈ How do I get started?
...and much more.
You can find Kolton on:
LinkedIn: https://www....
What are Team Topologies? How can they be used to deliver value simpler and more effectively (and in a more humane way)?
This week I'm joined by Luke McManus to discuss...
β°οΈ What are the four team topologies?
π Can we have too much collaboration?
β Team interaction models
π Cognitive load
πββοΈββ‘οΈ Value dynamics mapping
...and much more.
You can find Luke on:
LinkedIn: https://www.linkedin.com/in/lu...
How do you begin contributing to an open source project? What's it like? What do you get out of it?
This week I'm joined by Wendy Ha who shares her unique story of joining the Kubernetes project and becoming a contributor. We explore...
β°οΈ What it's like working on one of the biggest open source projects in the world
π The benefits of contributing to open source
β How much time and effort does it...
As an #SRE how do you influence senior leadership to get support and priority for the things you care about?
To answer this question I'm joined by Nora Jones, founder of Jeli and now Head of Pricing, Product Strategy and Growth at PagerDuty. Our conversation touches on...
π€ How understanding needs to flow both ways (between engineers and leaders)
π¨ Reliability is as much an art as a science
π Using napki...
This week I do a retrospective on the Slight Reliability podcast.
π How many people listen to it?
β€οΈ How do I feel about the show?
π What's going well?
πͺ΄ What could be better?
β What's next for the show?
If you want to check out the podcast that came before Slight Reliability, you can find Performance Time archived on YouTube here:
https://www.youtube.com/@performance-time
You can find St...
Have you burned out at work? What was your experience? How did you work through it?
This week I'm joined by the incredible Colette Alexander to discuss what burnout is, what it means, and we both share our personal experiences burning out at work. We cover...
π₯ What is burnout?
β Why does it happen?
π« What are the symptoms?
π₯ Fight, flight, or freeze
π§βπ Advice on how to recover
...and much more...
This week I'm joined by the wonderful Hanson Ho to discuss the unique challenges and opportunities in making our mobile apps observable! We cover...
π± The mobile/backend observability divide
βοΈ The challenge of distributed tracing on mobile apps
π The entire device runtime environment matters for your app
π€ The quest for user-centric mobile observability
β
Advice on how to get started with mobil...
This week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover...
ποΈββοΈ Reliability VS Robustness VS Resilience
π§© What is a complex system?
π’ Safety one/safety two
π§ Mental models
π© Human error
...and so much more.
Resources from this episode:
Four concepts for resilience (paper) by Dr. David Woods https://www.rese...
This week on the 100th episode I'm joined by DevOps and Resilience Engineering legend John Allspaw to talk about learning (especially from incidents). We discuss...
π Classroom VS situated learning
π€ The myth of the perfect handover
ITIL as a coping strategy to try and make sense of the organic, wild, and messy
π₯ How you cannot incentivise to avoid incidents (it doesn't work that way)
β€οΈβοΏ½...
This week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...
π Empowering engineers to implement change in your org
π§βπΌ Focusing on what matters (customer & business > technology)
π Not jus...
This week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...
π Is the root cause of every incident the big bang?
π¦ How the value of root cause degrades as complexity increases
π«£ That if the culture is not blameless, people will hide things
π³ Alternative approaches to root ca...
This week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...
π€ What is synthetic monitoring?
π¦Ύ What are the benefits and drawbacks to using it?
β’οΈ Non-web based synthetics (the tough stuff)
πΉ Combining RUM and synthetics
π«’ Does synthetics need an OTEL-like framework?
...and much more.
You can find David on:
This week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...
βοΈ Theory X vs Theory Y management
π£οΈ Intention based leadership and communication
π’ Conditions in an org for people to thrive
π΅βπ« How do you learn to manage and lead?
π«€ Managing people when you're not an expert in what they do
...and much more.
Resou...
This week Leon Adato and I break down the state of applying for roles in tech. We cover...
π What a resume or CV is and is not
π€ Leveraging your connections rather than relying on applying cold
πͺ How most job descriptions are works of fiction
π¦Ύ White-fonting to game AI resume assessment
π§ͺ Experimental ways we could recruit
...and our pitch for Kubernetes the Rock Opera (and much more)
This week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...
πͺ War stories and examples of production incidents
π©Ή The "hacks" we build to keep things running (and how maybe that's just normal)
π Keeping it simple... YAGNI (You Ain't Gonna Need It!)
π§― The perils of getting stuck in reactive ...
This week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...
π€·π½ Why move into leadership?
ποΈ Learning from other leaders
π What is unique about SRE leadership?
π Women in engineering leadership
...and we go through some feedback I got as a leader recently.
Resource...
This week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as...
πΈ Does your org treat observability as a cost centre or a value add?
π₯ Are you using observability reactively to solve problems? Or proactively to build better products and services?
π€ Is your observability connected to your users and business in a meaningful way?
π Is mon...
In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover:
π The new wave of mega-complex SaaS
βοΈ Challenges integrating SaaS with our observability pipelines
π©βπ¦― How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry
π° Paying twice to ingest, store, and search telemetry
π Monitoring and predicting SaaS obs...
This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management.
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube...
This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:
π¦
The recent Crowdstrike outage and their public post-mortem
π When do we do a blameless post-mortem?
π How do we do a blameless post-mortem?
β
How do we make sure action items are followed through?
π° The power of learning from post-mortems created by other tea...
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
Does hearing about a true crime case always leave you scouring the internet for the truth behind the story? Dive into your next mystery with Crime Junkie. Every Monday, join your host Ashley Flowers as she unravels all the details of infamous and underreported true crime cases with her best friend Brit Prawat. From cold cases to missing persons and heroes in our community who seek justice, Crime Junkie is your destination for theories and stories you wonβt hear anywhere else. Whether you're a seasoned true crime enthusiast or new to the genre, you'll find yourself on the edge of your seat awaiting a new episode every Monday. If you can never get enough true crime... Congratulations, youβve found your people. Follow to join a community of Crime Junkies! Crime Junkie is presented by audiochuck Media Company.
The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!
The Herd with Colin Cowherd is a thought-provoking, opinionated, and topic-driven journey through the top sports stories of the day.
The official podcast of comedian Joe Rogan.