Learning SRE, one day at a time.
This week on the I'm joined once more by SRE leader Michelle Casey who gives a broad and shallow introduction to resilience engineering. We cover...
ποΈββοΈ Reliability VS Robustness VS Resilience
π§© What is a complex system?
π’ Safety one/safety two
π§ Mental models
π© Human error
...and so much more.
Resources from this episode:
Four concepts for resilience (paper) by Dr. David Woods https://www.rese...
This week on the 100th episode I'm joined by DevOps and Resilience Engineering legend John Allspaw to talk about learning (especially from incidents). We discuss...
π Classroom VS situated learning
π€ The myth of the perfect handover
ITIL as a coping strategy to try and make sense of the organic, wild, and messy
π₯ How you cannot incentivise to avoid incidents (it doesn't work that way)
β€οΈβοΏ½...
This week I'm joined by SRE leader Trent Hornibrook who shares a story about how he improved on-call early in his career, and then we explore the broader theme of focusing on the things that matter in observability, incident response, on-call, and beyond. We discuss...
π Empowering engineers to implement change in your org
π§βπΌ Focusing on what matters (customer & business > technology)
π Not jus...
This week I'm joined by SRE leader Andrew Hatch from Cisco ThousandEyes to talk about a dirty word in the resilience community... root cause. In this excellent conversation we explore...
π Is the root cause of every incident the big bang?
π¦ How the value of root cause degrades as complexity increases
π«£ That if the culture is not blameless, people will hide things
π³ Alternative approaches to root ca...
This week I'm joined by David Dick from 2 Steps to (finally!) discuss synthetic monitoring. We cover...
π€ What is synthetic monitoring?
π¦Ύ What are the benefits and drawbacks to using it?
β’οΈ Non-web based synthetics (the tough stuff)
πΉ Combining RUM and synthetics
π«’ Does synthetics need an OTEL-like framework?
...and much more.
You can find David on:
This week I'm joined by Cin7 Engineering Director Milan Brown to unpack the challenges of technology management and leadership. We discuss...
βοΈ Theory X vs Theory Y management
π£οΈ Intention based leadership and communication
π’ Conditions in an org for people to thrive
π΅βπ« How do you learn to manage and lead?
π«€ Managing people when you're not an expert in what they do
...and much more.
Resou...
This week Leon Adato and I break down the state of applying for roles in tech. We cover...
π What a resume or CV is and is not
π€ Leveraging your connections rather than relying on applying cold
πͺ How most job descriptions are works of fiction
π¦Ύ White-fonting to game AI resume assessment
π§ͺ Experimental ways we could recruit
...and our pitch for Kubernetes the Rock Opera (and much more)
This week Priyam Kumar shares his story of moving from a massive organisation to a startup and the challenges and growth that came from that. We discuss...
πͺ War stories and examples of production incidents
π©Ή The "hacks" we build to keep things running (and how maybe that's just normal)
π Keeping it simple... YAGNI (You Ain't Gonna Need It!)
π§― The perils of getting stuck in reactive ...
This week Michelle Casey shares her insights as a 'head of' engineering manager in the SRE context. This was one of my favourite conversations on the podcast so far. We cover topics such as...
π€·π½ Why move into leadership?
ποΈ Learning from other leaders
π What is unique about SRE leadership?
π Women in engineering leadership
...and we go through some feedback I got as a leader recently.
Resource...
This week Adam and I get philosophical about what constitutes maturity in the field of observability. We tackle questions such as...
πΈ Does your org treat observability as a cost centre or a value add?
π₯ Are you using observability reactively to solve problems? Or proactively to build better products and services?
π€ Is your observability connected to your users and business in a meaningful way?
π Is mon...
In this episode I explore the challenges of achieving unified observability when integrating with SaaS products and services. I cover:
π The new wave of mega-complex SaaS
βοΈ Challenges integrating SaaS with our observability pipelines
π©βπ¦― How the lack of SaaS autonomy limits the effectiveness of OpenTelemetry
π° Paying twice to ingest, store, and search telemetry
π Monitoring and predicting SaaS obs...
This week I check in and give an update on work, life, and my attempts at bringing to life SRE practices in the world of non-production environment management.
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_kiwi_sre
YouTube...
This week I'm joined by Karanveer Anand, SRE Technical Program Manager at Google to discuss blameless post-mortems. We cover:
π¦
The recent Crowdstrike outage and their public post-mortem
π When do we do a blameless post-mortem?
π How do we do a blameless post-mortem?
β
How do we make sure action items are followed through?
π° The power of learning from post-mortems created by other tea...
This week Zach Michel from https://middleware.io/ and I discuss the state of OpenTelemetry and what it means to adopt it. We cover:
π©οΈ Achieving observability in a SaaS world
π₯« Context propagation - the magic sauce of OTEL
πͺ The telemetry gateway concept and leveraging the OTEL collector
πͺ΅ The state of OpenTelemetry logging
π« Making use of the OpenTelemetry community
...and much ...
In Episode 80 Niall Murphy talked about the need for SREs to be better at articulating the value of our work. In this episode I'm joined by ex-Googler and Engineering Director (SRE) at Culture Amp Artem Yakimenko about how we might achieve this.
We discuss both quantifiable and qualitative approaches including leveraging the untapped data in support tickets, customer sentiment and rankings, the relations...
In the world of SRE we constantly talk about defining SLOs, but what about evolving them over time? This week I chat with SRE Tech Lead Dom Finn about just that. We cover the relationship between reliability and user analytics, latency classes as a way to speak SLOs with business stakeholders, the role of NFRs and how the thresholds differ from SLOs, and much more.
Books mentioned in the episode:
The...
This week I talk about the impact of SaaS-first technology strategies on the work of an SRE. I pose questions about observability, ownership, on-call, and how much control we have over reliability.
You can find the Bleeding Tech blog on Medium: https://medium.com/@stownshend
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/stephentownshend/
Twitter: https://twitter.com/the_...
This week I chat with Dan Slimmon about applying the approach doctors use to treat patient symptoms during incident response.
You can find Dan's blog at https://blog.danslimmon.com/ or connect with him on LinkedIn here: https://www.linkedin.com/in/danslimmon/
You can find the official Slight Reliability podcast website at: https://slightreliability.com/
You can find Stephen at:
LinkedIn: https://www.linkedin.com/in/s...
This week I hear about all things Kubernetes from Komodor CTO and co-founder Itiel Shwartz. We chat about the promise that was made when Kubernetes first entered the industry, the challenge of getting developers engaged and capable of working in Kubernetes, my hate/hate relationship with Helm but its important contribution to the Kubernetes project, Kubernetes observability, and so much more.
You can find the...
This week I sit down and have a discussion with Amin Astaneh (from Certo Modo) about CI/CD. We cover the power of the standard change as a way to navigate ITIL while still implementing DevOps practices, what to monitor to make your CI/CD observable, single piece flow, testing in production, and so much more.
You can find Amin on his company website https://certomodo.io, LinkedIn: https://www.linkedin.com/in/a...
If you've ever wanted to know about champagne, satanism, the Stonewall Uprising, chaos theory, LSD, El Nino, true crime and Rosa Parks, then look no further. Josh and Chuck have you covered.
The latest news in 4 minutes updated every hour, every day.
Current and classic episodes, featuring compelling true-crime mysteries, powerful documentaries and in-depth investigations. Follow now to get the latest episodes of Dateline NBC completely free, or subscribe to Dateline Premium for ad-free listening and exclusive bonus content: DatelinePremium.com
The World's Most Dangerous Morning Show, The Breakfast Club, With DJ Envy, Jess Hilarious, And Charlamagne Tha God!
Iβm Jay Shetty host of On Purpose the worlds #1 Mental Health podcast and Iβm so grateful you found us. I started this podcast 5 years ago to invite you into conversations and workshops that are designed to help make you happier, healthier and more healed. I believe that when you (yes you) feel seen, heard and understood youβre able to deal with relationship struggles, work challenges and lifeβs ups and downs with more ease and grace. I interview experts, celebrities, thought leaders and athletes so that we can grow our mindset, build better habits and uncover a side of them weβve never seen before. New episodes every Monday and Friday. Your support means the world to me and I donβt take it for granted β click the follow button and leave a review to help us spread the love with On Purpose. I canβt wait for you to listen to your first or 500th episode!