🔗 Post on Kitopi Blog

Description

Site Reliability Engineering (SRE) takes its roots in Google, where the idea was started in 2004 by Ben Treynor Sloss, who was tasked with improving the system’s performance, availability, and stability. If we were to summarize in one sentence the aim of SRE in modern organizations, it would be to provide all necessary means to achieve the required availability (for example 99.99% of the time) of the system. A Site Reliability Engineer is someone that comes from a very wide background covering system administration, application development, software testing, and business analysis. Their main goal is to work closely with the development and operational (also known as DevOps) teams to improve the resiliency, observability, and overall reliability of the system using programming methods.

In the post I present the origins and rules behind Site Reliability Engineering.