The DevSecOps Expert

The DevSecOps Expert

Share this post

The DevSecOps Expert
The DevSecOps Expert
What is O11y? - Tracing

What is O11y? - Tracing

O11y Series

Mark Pashby's avatar
Mark Pashby
Jul 17, 2025
∙ Paid

Share this post

The DevSecOps Expert
The DevSecOps Expert
What is O11y? - Tracing
Share

Welcome to the third part of my series on observability engineering! The first part can be found here, and the second part here. I hope this series proves useful to you, and I’m excited to continue working on the upcoming posts!

Introduction

You know the score by now and I won’t delve into a full explanation or analogy of observability engineering again (for a refresher, please check out the first post in the series), but let’s quickly recap. Observability engineering involves measuring the internal states of a system or application by examining its outputs. It’s straightforward – no secret sauce, and no hocus pocus! In this series, we’ll cover important topics within observability engineering that should benefit both newcomers and seasoned engineers alike.

I enjoy having my opinions challenged and changed through healthy discussion.

Pillar Three - Tracing

What is tracing?

In modern distributed systems, particularly those built on microservices or serverless architectures, different services often need to interact with each other to fulfil a single user request. This interconnectedness makes it incredibly challenging to identify performance bottlenecks, diagnose issues, and analyse overall system behaviour. The difficulty is amplified when these services span multiple domains and are managed by different teams.

Consider a simple example of an online bookstore. Zod, the senior engineer on the orders team, notices that requests are timing out. His team is responsible for the orders microservice, which interacts with the inventory microservice to check the stock of an item during checkout. After receiving the stock information, the orders service places an order by sending another request to the inventory service. The inventory service then contacts the logistics service to get an estimated delivery time. However, unbeknownst to Zod and his team, the logistics service recently made some changes to their backend queries and inadvertently missed some crucial index updates in their NoSQL database. As a result, queries to the logistics service’s database are taking much longer than usual, ultimately causing the orders service to time out because it can’t provide the user with a delivery ETA.

I know, I know—it’s clear I’ve never worked in the orders or logistics domains before, but you get the point. In distributed systems supported by multiple teams, often across different domains, pinpointing the root cause of an issue like this would be nearly impossible without tracing. In my opinion, of the three main pillars of observability, tracing is the most crucial.

Keep reading with a 7-day free trial

Subscribe to The DevSecOps Expert to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Mark Pashby
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share