Users’ expectations for cloud service availability are sky-high. When we want to use these apps, we expect them to work like water flows from the tap. But when systems are up and running, few people are focused on reliability. It’s cost prohibitive and counterproductive to try to guarantee 100% availability since that pulls resources away from the innovation and development that drive business growth. These tradeoffs are hard.
In this live webinar, we’ve assembled a panel of experts that led teams through these challenges during rapid cloud expansion at businesses including AWS, Amazon.com, Google, Microsoft, and IBM. They’re here to share their experiences and perspectives from the trenches. They'll discuss how to identify potential reliability issues before they become customers’ concerns, how to implement proactive monitoring, alerting, and remediation systems and best practices for ensuring your cloud-based services are designed for maximum availability. They aim to share some hard-won wisdom on improving your cloud services' reliability.
This panel deeply believes in investing in reliability. They show this not only by their work in their day jobs but also by helping launch the reliability.org community as founding members. Their shared vision for creating a community of like-minded folks that can continue the reliability discussion is the idea that sparked this special CNCF webinar to mark the launch of the reliability.org community.
Founder & CEO
Co-founder & CEO
Host of the Slight Reliability Podcast
Anurag is the founder of Shoreline.io, a DevOps company focused on incident automation - making it easy to automate away commonly occurring incidents and possible to quickly and safely debug and repair new incidents. Before Shoreline, Anurag was a VP at AWS, where he was responsible for transactional database and analytic services, growing this business a thousand-fold over his time there. He has also been an early member of three startups, with one IPO and two acquisitions.
Niall has worked in Internet infrastructure since the mid-1990s, specialising in large online services. He has worked with all of the major cloud providers from their Dublin, Ireland offices, and most recently at Microsoft, where he was global head of Azure Site Reliability Engineering (SRE).” His books have sold approximately a quarter of a million copies world-wide, most notably the award-winning Site Reliability Engineering, and he is probably one of the few people in the world to hold degrees in Computer Science, Mathematics, and Poetry Studies. He lives in Dublin, Ireland, with his wife and two children.
Stephen is currently working as a Reliability Advocate for unified dashboarding company SquaredUp. Previously he worked as a performance engineer for many years years before switching to SRE more recently. Stephen is passionate about making complex ideas easy to understand and implement, and promoting empathy and psychological safety in technology. He shares his SRE learning journey in his podcast Slight Reliability.
Cloud Native Computing Foundation
Linux Foundation (CNCF)