Welcome to our exploration of observability (o11y)! Whether you’re a newcomer or a seasoned software engineer looking for a refresher, you’re in the right place. Grab a cup of tea, settle in, and let’s dive into the realms of observability, site reliability engineering, alerting, and incident management. This mini-series will break down these complex topics into manageable parts, ensuring a smooth and informative read.
Introduction
So, what exactly is observability engineering? Simply put, it’s the ability to monitor, measure, and understand the state of a system or application through its external outputs.
Let’s use an analogy to illustrate. Consider a commercial aircraft. An aeronautical engineer relies on data from the aircraft to make informed maintenance decisions, such as monitoring the oil pressure of the jet engines. Pilots, on the other hand, need real-time metrics like altitude and cabin pressure to ensure safe flights. All this data is displayed on a dashboard, providing a comprehensive view of the aircraft’s performance. Manufacturers also analyse this data to plan improvements. To put it in perspective, a single commercial aircraft can generate up to 20 terabytes of data per engine, per hour of flight – a staggering amount of information!
In the same way, observability is crucial in modern software engineering. Mastering it means you can confidently answer how your system or application is performing without constantly checking if it’s still running. Instead, you’ll have a robust alerting and incident response platform, giving you peace of mind. After all, you want to make sure you know about an issue before your customers do!
As for my background, I’ve been in the observability field for years, working with network devices, server hardware, monolithic applications, micro-services, and event-driven architecture. I’ve designed and implemented platforms capable of handling tens of thousands of data points per second. My experience spans creating “observability in a box” solutions, offering platform-as-a-product services that software engineering teams can easily integrate with. I bring a wealth of knowledge and strong opinions on the subject.
Keep reading with a 7-day free trial
Subscribe to The DevSecOps Expert to keep reading this post and get 7 days of free access to the full post archives.