<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[The DevSecOps Expert]]></title><description><![CDATA[UK Cloud/Security Architect and Platform Engineering Evangelist. CISSP & AWS Security Specialist. Build it and they will come. Previously at Sage, Worldfirst, DAZN and Cazoo.]]></description><link>https://www.thedevsecopsexpert.com</link><image><url>https://substackcdn.com/image/fetch/$s_!ZM5t!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb7fb268-7c20-4627-b422-ac7d13c03edf_500x500.png</url><title>The DevSecOps Expert</title><link>https://www.thedevsecopsexpert.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 06 May 2026 11:34:02 GMT</lastBuildDate><atom:link href="https://www.thedevsecopsexpert.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Mark Pashby]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[thedevsecopsexpert@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[thedevsecopsexpert@substack.com]]></itunes:email><itunes:name><![CDATA[Mark Pashby]]></itunes:name></itunes:owner><itunes:author><![CDATA[Mark Pashby]]></itunes:author><googleplay:owner><![CDATA[thedevsecopsexpert@substack.com]]></googleplay:owner><googleplay:email><![CDATA[thedevsecopsexpert@substack.com]]></googleplay:email><googleplay:author><![CDATA[Mark Pashby]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Encrypting data with AWS KMS]]></title><description><![CDATA[AWS KMS Envelope Encryption]]></description><link>https://www.thedevsecopsexpert.com/p/encrypting-data-with-aws-kms</link><guid isPermaLink="false">https://www.thedevsecopsexpert.com/p/encrypting-data-with-aws-kms</guid><dc:creator><![CDATA[Mark Pashby]]></dc:creator><pubDate>Thu, 17 Jul 2025 19:36:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0dnL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Introduction</h1><p>A little while ago, in a previous role, an engineer posted a question in Slack about using AWS KMS to encrypt data. They&#8217;re a bright and thoughtful person, and their product manager had set a requirement to ensure the data was encrypted at rest at the data level. I was really pleased to see this being discussed as a non-functional requirement, especially given the sensitivity of the data. This kind of thinking is crucial when considering defence-in-depth.</p><p>However, when it came to implementation, the engineer found that AWS KMS didn&#8217;t seem to be working as expected&#8212;encryption and decryption were taking longer than anticipated. After a quick refresher on envelope encryption with AWS Encryption SDK Guide, I joined the discussion. It struck me that this exact question had come up in previous roles too. To ensure I&#8212;and others&#8212;have something to refer back to in the future, I decided to document it properly.</p><h1>What is AWS KMS and the AWS Encryption SDK?</h1><p>Straight from the <a href="https://docs.aws.amazon.com/kms/latest/developerguide/overview.html">horse&#8217;s mouth</a> - AWS KMS is a AWS managed service that makes it easy for you to create and control the encryption keys that are used to encrypt your data. The AWS KMS you create are protected by <a href="https://en.wikipedia.org/wiki/FIPS_140-3">FIPS-140-3</a> Security Level 3 hardware security modules (HSM), which is the cryptographic standard approved by the US Government.</p><p>When encrypting data, it&#8217;s essential to secure the encryption key used to generate the ciphertext. If that key is encrypted, the key protecting it must also be safeguarded, and so on. Eventually, this chain leads back to the root key, which never leaves AWS KMS. This is critical because if the root key were ever compromised, all other encryption keys derived from it would be compromised as well.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!p84L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!p84L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 424w, https://substackcdn.com/image/fetch/$s_!p84L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 848w, https://substackcdn.com/image/fetch/$s_!p84L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 1272w, https://substackcdn.com/image/fetch/$s_!p84L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!p84L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif" width="699" height="193" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:193,&quot;width&quot;:699,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65830,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168584904?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!p84L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 424w, https://substackcdn.com/image/fetch/$s_!p84L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 848w, https://substackcdn.com/image/fetch/$s_!p84L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 1272w, https://substackcdn.com/image/fetch/$s_!p84L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed586e6d-f16d-4694-b3d6-4db7ec0c9102_699x193.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>Lastly, we should quickly talk about KMS key types, of which there are three:</p><ul><li><p><strong>Customer Managed Keys</strong> - This key type is the de facto choice for customers that want full control of the usage and lifecycle policy. This means that the customer is responsible for setting rotation, deletion and regional location of keys. Auditing/logging is available for CMK&#8217;s via AWS CloudTrail or Event Data Store. You pay a monthly fee for the existence of keys (pro-rated hourly), and also the customer is charged for key usage.</p></li><li><p><strong>AWS Managed Keys</strong> - This key type is a KMS key that exists in your account, but can only be used under certain circumstances. Specifically, it can only be used in the context of the AWS service you&#8217;re operating in and it can only be used by principals within the account that the key exists. AWS managed keys are a legacy key type that is no longer being created for new AWS services as of 2021. You get the same auditing/logging capabilities as customer managed keys, but a slight difference in that there is no monthly fee; but the caller is charged for API usage on these keys. The main key difference between AWS managed keys and customer managed keys, is that AWS Managed keys manage rotation, deletion and regional location etc. This significantly reduces the overhead of managing keys, so you can just concentrate on encrypting the data and forget about the management of the root keys.</p></li><li><p><strong>AWS Owned Keys</strong> - We mentioned the discontionuation of AWS Managed Keys, because the new key on the block (sorry I couldn&#8217;t resist!) is AWS Owned Keys. So, what do they do? An AWS owned key is a KMS key that is in an account managed by the AWS service, so the service operators have the ability to manage its lifecycle and usage permissions. By using AWS owned keys, AWS services can transparently encrypt your data and allow for easy cross-account or cross-region sharing of data without you needing to worry about key permissions. This is an incredibly important feature of using AWS Owned Keys, which are effectively AWS Managed Keys but with reduced blast radiuses. Exclusively controlled and only viewable by the AWS service that encrypts your data, and you also lose auditing/logging capabilities, for now. AWS service manages rotation, deletion, and Regional location, exactly as AWS Managed Keys do. You pay no charges for using AWS Owned keys which is by far one of the main advantages of using them!</p></li></ul><p>The last thing we need to quickly discuss in this section is what the AWS Encryption SDK is, and envelope encryption. Straight from the <a href="https://docs.aws.amazon.com/encryption-sdk/latest/developer-guide/introduction.html">horse&#8217;s mouth</a> again, the AWS Encryption SDK is a client-side encryption library designed to make it easy for everyone to encrypt and decrypt data using industry standards and best practices. It enables you to focus on the core functionality of your application, rather than on how to best encrypt and decrypt your data. The AWS Encryption SDK is provided free of charge under the Apache 2.0 license. This is a nice move by AWS, but it is within their best interest to give their customers access to such functionality, under the <a href="https://aws.amazon.com/compliance/shared-responsibility-model/">shared responsibility model</a>.</p><p>The security of your encrypted data depends in part on protecting the data key that can decrypt it. One accepted best practice for protecting the data key is to encrypt it. To do this, you need another encryption key, known as a key-encryption key or wrapping key. The practice of using a wrapping key to encrypt data keys is known as envelope encryption. See the images below for a visual explanation on enevelope encryption:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0dnL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0dnL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 424w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 848w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 1272w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0dnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif" width="1099" height="422" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:422,&quot;width&quot;:1099,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114333,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168584904?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0dnL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 424w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 848w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 1272w, https://substackcdn.com/image/fetch/$s_!0dnL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca57b8c6-74bf-481a-a3b3-68a3eb65aca6_1099x422.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Multiple wrapping keys can be used to encrypt the same data key, which adds some really good fault tolerance/disaster recovery controls:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QspK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QspK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 424w, https://substackcdn.com/image/fetch/$s_!QspK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 848w, https://substackcdn.com/image/fetch/$s_!QspK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 1272w, https://substackcdn.com/image/fetch/$s_!QspK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QspK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif" width="748" height="528" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dbbd3502-63f7-4752-9655-07965571f508_748x528.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:528,&quot;width&quot;:748,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:126968,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168584904?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QspK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 424w, https://substackcdn.com/image/fetch/$s_!QspK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 848w, https://substackcdn.com/image/fetch/$s_!QspK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 1272w, https://substackcdn.com/image/fetch/$s_!QspK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdbbd3502-63f7-4752-9655-07965571f508_748x528.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ok, this wraps up all that I wanted to cover about KMS and Enevelope Encryption for now. The AWS documentation is fantastic for further reading if you really want to know more. Now we will run through an example in Python, and call it a day.</p><h1>Lets play with Envelope Encryption</h1><p>First of all, we need to import a bunch providers and helpers from the aws cryptographic material providers sdk.</p><pre><code><code>import boto3
from aws_cryptographic_material_providers.mpl import AwsCryptographicMaterialProviders
from aws_cryptographic_material_providers.mpl.config import MaterialProvidersConfig
from aws_cryptographic_material_providers.mpl.models import CreateAwsKmsKeyringInput
from aws_cryptographic_material_providers.mpl.references import IKeyring
from typing import Dict 

import aws_encryption_sdk
from aws_encryption_sdk import CommitmentPolicy</code></code></pre><p>Next, we can create a global variable just purely as an example, with a byte string value. Lets also create the SDK+KMS clients and the encryption context:</p><pre><code><code>EXAMPLE_DATA: bytes = b"Hello KMS Learners"


def encrypt_and_decrypt_with_keyring(kms_key_id: str):
    client = aws_encryption_sdk.EncryptionSDKClient(
        commitment_policy=CommitmentPolicy.REQUIRE_ENCRYPT_REQUIRE_DECRYPT
    )

    kms_client = boto3.client('kms', region_name="us-west-2")

    encryption_context: Dict[str, str] = {
        "encryption": "example context",
        "is not": "secret",
        "but adds": "useful metadata",
        "that can help you": "be confident that",
        "the data you are handling": "is what you think it is",
    }</code></code></pre><p>We need to create the KMS keyring next:</p><pre><code><code>material_provider: AwsCryptographicMaterialProviders = AwsCryptographicMaterialProviders(
        config=MaterialProvidersConfig()
    )

    keyring_input: CreateAwsKmsKeyringInput = CreateAwsKmsKeyringInput(
        kms_key_id=kms_key_id,
        kms_client=kms_client
    )

    kms_keyring: IKeyring = material_provider.create_aws_kms_keyring(
        input=keyring_input
    )</code></code></pre><p>Next up we want to encrypt our example data, and do a quick assertion to confirm its now ciphertext (encrypted message):</p><pre><code><code>ciphertext, _ = client.encrypt(
        source=EXAMPLE_DATA,
        keyring=kms_keyring,
        encryption_context=encryption_context
    )

    assert ciphertext != EXAMPLE_DATA, \
        "Ciphertext and plaintext data are the same. Invalid encryption"</code></code></pre><p>And lastly, do a quick decryption test:</p><pre><code><code>plaintext_bytes, _ = client.decrypt(
        source=ciphertext,
        keyring=kms_keyring,
        # Provide the encryption context that was supplied to the encrypt method
        encryption_context=encryption_context,
    )

    assert plaintext_bytes == EXAMPLE_DATA, \
        "Decrypted plaintext should be identical to the original plaintext. Invalid decryption"</code></code></pre><p>This playground example is just demonstrating a single key keyring, have a look at <code>discovery_multi_keyring</code> for multi key keyrings.</p><h1>Letssss Goooo</h1><p>Thank you for reading this post and hopefully you found some value in it! Some quick takeaway points:</p><ul><li><p>We learnt what KMS and the AWS Encryption SDK is, as well as Envelope Encryption.</p></li><li><p>We learnt about the KMS key types, and a little bit of information on each type.</p></li><li><p>You can have multiple wrapping keys for one data key, which is a really nice fault tolerance feature in the event of a disaster.</p></li><li><p>Do you still need to think about things like <a href="https://en.wikipedia.org/wiki/Salt_(cryptography)">salt</a> for additional hashing of ciphertext? Yes, I highly recommend adding these additional layers of defence in. Remember, your security is only as good as the layers of controls you put in-place to protect your data.</p></li><li><p>I do recommend thinking about using Customer Managed Keys over AWS Managed/Owned keys, and also thinking about multi-region and multiple wrapping keys. Think about how to reduce blast radiuses in the event of a major security incident.</p></li></ul><p>Thanks all, and see you next time!</p><p><a href="https://www.pash.by/categories/tech/">tech</a>, <a href="https://www.pash.by/categories/devops/">devops</a>, <a href="https://www.pash.by/categories/aws/">aws</a>, <a href="https://www.pash.by/categories/security/">security</a>, <a href="https://www.pash.by/categories/software-engineering/">software_engineering</a><br><br></p>]]></content:encoded></item><item><title><![CDATA[What is O11y? - Tracing]]></title><description><![CDATA[O11y Series]]></description><link>https://www.thedevsecopsexpert.com/p/what-is-o11y-tracing</link><guid isPermaLink="false">https://www.thedevsecopsexpert.com/p/what-is-o11y-tracing</guid><dc:creator><![CDATA[Mark Pashby]]></dc:creator><pubDate>Thu, 17 Jul 2025 19:14:33 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/bef673ea-888a-401a-a682-fa1eb6190e2a_707x670.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the third part of my series on observability engineering! The first part can be found <a href="https://substack.com/@thedevsecopsexpert/note/p-168578007?utm_source=notes-share-action&amp;r=g7rls">here</a>, and the second part <a href="https://substack.com/@thedevsecopsexpert/note/p-168579982?utm_source=notes-share-action&amp;r=g7rls">here</a>. I hope this series proves useful to you, and I&#8217;m excited to continue working on the upcoming posts!</p><h1>Introduction</h1><p>You know the score by now and I won&#8217;t delve into a full explanation or analogy of observability engineering again (for a refresher, please check out the first post in the series), but let&#8217;s quickly recap. Observability engineering involves measuring the internal states of a system or application by examining its outputs. It&#8217;s straightforward &#8211; no secret sauce, and no hocus pocus! In this series, we&#8217;ll cover important topics within observability engineering that should benefit both newcomers and seasoned engineers alike.</p><p>I enjoy having my opinions challenged and changed through healthy discussion.</p><h1>Pillar Three - Tracing</h1><h2>What is tracing?</h2><p>In modern distributed systems, particularly those built on microservices or serverless architectures, different services often need to interact with each other to fulfil a single user request. This interconnectedness makes it incredibly challenging to identify performance bottlenecks, diagnose issues, and analyse overall system behaviour. The difficulty is amplified when these services span multiple domains and are managed by different teams.</p><p>Consider a simple example of an online bookstore. Zod, the senior engineer on the orders team, notices that requests are timing out. His team is responsible for the orders microservice, which interacts with the inventory microservice to check the stock of an item during checkout. After receiving the stock information, the orders service places an order by sending another request to the inventory service. The inventory service then contacts the logistics service to get an estimated delivery time. However, unbeknownst to Zod and his team, the logistics service recently made some changes to their backend queries and inadvertently missed some crucial index updates in their NoSQL database. As a result, queries to the logistics service&#8217;s database are taking much longer than usual, ultimately causing the orders service to time out because it can&#8217;t provide the user with a delivery ETA.</p><p>I know, I know&#8212;it&#8217;s clear I&#8217;ve never worked in the orders or logistics domains before, but you get the point. In distributed systems supported by multiple teams, often across different domains, pinpointing the root cause of an issue like this would be nearly impossible without tracing. In my opinion, of the three main pillars of observability, tracing is the most crucial.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!94tT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!94tT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 424w, https://substackcdn.com/image/fetch/$s_!94tT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 848w, https://substackcdn.com/image/fetch/$s_!94tT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 1272w, https://substackcdn.com/image/fetch/$s_!94tT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!94tT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif" width="450" height="302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:302,&quot;width&quot;:450,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:751414,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!94tT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 424w, https://substackcdn.com/image/fetch/$s_!94tT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 848w, https://substackcdn.com/image/fetch/$s_!94tT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 1272w, https://substackcdn.com/image/fetch/$s_!94tT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6090f1f2-a371-48a9-9882-d275adc72a20_450x302.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The anatomy of traces and spans</h2><p>To keep it very simple, a <strong>trace</strong> consists of a series of interconnected spans. Each <strong>span</strong> represents an individual operation or activity within a specific service or component, ie. a database query like <code>SELECT product_description, product_stock_count FROM Inventory WHERE product_id="xxxxx"</code>. The crucial piece of information about tracing and the passing of requests is that when a request enters a service or component, the trace context is propagated along with the request. This usually involves injecting trace. headers (including the trace_id) into the request, allowing downstream services to participate in the same trace.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DU3q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DU3q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 424w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 848w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 1272w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DU3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png" width="999" height="464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:999,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:65192,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DU3q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 424w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 848w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 1272w, https://substackcdn.com/image/fetch/$s_!DU3q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F36ec5af7-fc44-40ad-83cc-20910efed95e_999x464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Right, I am going to do my obligatory thing in this series and mention <a href="https://opentelemetry.io/docs/concepts/signals/traces/">OpenTelemetry</a> again. Deal with it! But seriously, the easiest and most standardised way of getting started is to pick an OpenTelemetry APM and off you go. Most modern vendors (ie. Datadog, Sumologic, Honeycomb) will provide exporters and well tested documentation.</p><p>Let&#8217;s quickly cover the make up of spans, the core component of a trace. A <strong>span</strong> represents an operation (or a unit of work) in a trace. A span could be a database query, or an in-process function call, or even a remote procedure call (RPC). A span has all of these things:</p><ul><li><p>A span name (operation name).</p></li><li><p>A parent span.</p></li><li><p>A span kind.</p></li><li><p>A start and end time.</p></li><li><p>A status that reports whether operation succeeded or failed.</p></li><li><p>A set of key-value attributes describing the operation.</p></li><li><p>A timeline of events.</p></li><li><p>A list of links to other spans.</p></li><li><p>A span context that propagates trace ID and other data between different services.</p></li></ul><p>A trace is a tree of spans that shows the path that a request makes through an app. The root span is the first span in a trace. An example:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ITTF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ITTF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 424w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 848w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 1272w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ITTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png" width="707" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49685f53-564a-4282-9a1f-e67a627702e6_707x670.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:707,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73302,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ITTF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 424w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 848w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 1272w, https://substackcdn.com/image/fetch/$s_!ITTF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49685f53-564a-4282-9a1f-e67a627702e6_707x670.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, time for some sage advice. All OpenTelemetry backends use span names and some attributes to group similar spans together. To group spans properly, I highly recommend giving them short and concise names. You should aim to have less than 1000 unique span names, for performance reasons. Let&#8217;s look at some good and bad span names:</p><h3>Good</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eOgb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eOgb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 424w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 848w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 1272w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eOgb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png" width="1240" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54478,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eOgb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 424w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 848w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 1272w, https://substackcdn.com/image/fetch/$s_!eOgb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F537f3e90-caba-4470-b4f4-e0327afc497b_1240x460.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Bad</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!MPa3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!MPa3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 424w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 848w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 1272w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!MPa3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png" width="1240" height="460" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:460,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:52456,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!MPa3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 424w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 848w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 1272w, https://substackcdn.com/image/fetch/$s_!MPa3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8a1d6c8e-b51c-4608-86f2-aa40780e5636_1240x460.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Lastly on spans, every span must have a kind/type, and it must be one of these values:</p><ul><li><p><code>server</code> for server operations, for example, HTTP server handler.</p></li><li><p><code>client</code> for client operations, for example, HTTP client requests.</p></li><li><p><code>producer</code> for message producers, for example, a Kafka producer.</p></li><li><p><code>consumer</code> for message consumers and async functions, for example, a Kafka consumer.</p></li><li><p><code>internal</code> for internal operations.</p></li></ul><p>&#8230;also, spans must have a status code of one of the following values:</p><ul><li><p><code>ok</code> - success.</p></li><li><p><code>error</code> - failure.</p></li><li><p><code>unset</code> - the default value which allows the backends to assign the status.</p></li></ul><h2>Some additional features of traces and spans</h2><h3>Attributes</h3><p>If you wish to record contextual information, you can annotate spans with attributes that carry information specific to the operation. Let&#8217;s give a basic example such as a HTTP endpoint, which may have attributes like <code>http.method = GET</code> and <code>http.route = /orders/:id</code>.</p><p>You have the freedom to name attributes as you want, but for common operations you should use the OpenTel <a href="https://opentelemetry.io/docs/specs/semconv/general/trace/">semantic attributes</a> convention.</p><h3>Events</h3><p>You have the option to annotate spans with events that have a start time and an arbitary number of attributes. The main difference between events and spans is that events don&#8217;t have an end time (and therefore no duration).</p><p>Events can usually represent exceptions, errors, logs, and messages (such as in RPC), but you can also create custom events if you so wish. For example, you may have a telemetry wrapper that your engineeing team uses to annotate spans in a standardised way, including sending your logs via tracing as well.</p><p>The observant of you may remember back in my first post in this series, you will note that the first pillar of observability is on <strong>Logs</strong>. So, if you can send event-logs via spans in a trace, why do you need logs as a seperate pillar? Well, the simple answer is, you probably don&#8217;t. However, not all systems architecture is created equally, and through acquisitions or divergent paths in your tech choices, you may still want to collect logs from different components. So, if your tech stack is fairly simple, and you just instrument your code to send traces to your backend/observability platform of choice, then you can likely just stick with this approach.</p><h3>Context</h3><p>Context is an important feature of spans. The span context carries information about the span as it propagates through different components and services in the tree.</p><p>The trace/span context is a request-scoped data object such as:</p><ul><li><p><code>Trace ID</code>. A globally unique identifier that represents the entire trace or query. All spans within a trace have the same trace ID.</p></li><li><p><code>Span ID</code>. A unique identifier for the specific span within a trace. Each span within a trace has a different span ID.</p></li><li><p><code>Trace flags</code>. Flags that indicate various properties of the trace, such as whether it&#8217;s sampled or not. Sampling refers to the process of determining which spans should be recorded and reported to the observability backend.</p></li><li><p><code>Trace State</code>. An optional field that contains additional vendor or application-specific data related to the trace.</p></li></ul><p>The span context is incredibly important for maintaining the continuity and correlation of spans within a distributed system. It allows different services and components to associate their spans with the correct trace and provides true end-to-end visibility into the flow of requests or transactions. The span context is typically propagated using headers or metadata of the communication protocols between services, similar to how baggage data is propagated, which we will cover in a minute. This is to make sure that when a service receives a request, it can extract the span context, and associate the ingress span with the correct trace.</p><p>You can use data from the context for spans correlation or sampling. For example, you can use the <code>trace_id</code> to know which spans belong to which traces, which is obviously incredibly important during troubleshooting or sampling!</p><p>Lastly on context, read up on context propagation from the OpenTelemetry docs <a href="https://opentelemetry.io/docs/concepts/context-propagation/">here</a>. There is a section on supported serialisation and deserialisation protocols on that page as well, which is helpful!</p><h3>Baggage</h3><p>We all come with baggage, and fortunately so does tracing! <a href="https://opentelemetry.io/docs/concepts/signals/baggage/">Baggage</a> allows you to propgate custom key:value pairs (attributes) from one service to another. The example on the OpenTelemetry documentation is fantastic, so I won&#8217;t give another - just have a read.</p><h2>What should we instrument then and how?</h2><p>You really do not need to instrument every operation in your code to get the most out of tracing - it would be very time consuming and it&#8217;s not really necessary, or even valuable for your observability practices. Consider prioritising these operations:</p><ul><li><p><code>Network operations</code>, for example, HTTP requests or RPC calls.</p></li><li><p><code>Filesystem operations</code>, for example, reading/writing to files.</p></li><li><p><code>Database queries</code> which combine network and filesystem operations.</p></li><li><p><code>Errors and logs</code>, for example, using structured logging, which from my <a href="https://www.pash.by/posts/observability-logs/">first post</a> in this series.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2ZDV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2ZDV!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 424w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 848w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2ZDV!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif" width="480" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5e847db-e452-4300-9f01-66382353151d_480x480.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:5523018,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.thedevsecopsexpert.com/i/168581726?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2ZDV!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 424w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 848w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 1272w, https://substackcdn.com/image/fetch/$s_!2ZDV!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5e847db-e452-4300-9f01-66382353151d_480x480.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Ok, so now a quick example in Golang before the wrap up!</p><p><strong>Step 1</strong>. Let&#8217;s instrument the following example function where we are inserting a new order:</p><pre><code><code>func insertOrder(ctx context.Context, order *Order) error {
  if _, err := db.NewInsert().Model(order).Exec(ctx); err != nil {
    return err
  }
  return nil
}</code></code></pre><p><strong>Step 2</strong>. Let&#8217;s wrap the operation with a span:</p><pre><code><code>import "go.opentelemetry.io/otel"

var tracer = otel.Tracer("app_or_package_name")

func insertOrder(ctx context.Context, order *Order) error {
  ctx, span := tracer.Start(ctx, "insert-order")
  defer span.End()

  if _, err := db.NewInsert().Model(order).Exec(ctx); err != nil {
    return err
  }
  return nil
}</code></code></pre><p><strong>Step 3</strong>. Let&#8217;s record errors and set a status code:</p><pre><code><code>import "go.opentelemetry.io/otel"

var tracer = otel.Tracer("app_or_package_name")

func insertOrder(ctx context.Context, order *Order) error {
  ctx, span := tracer.Start(ctx, "insert-order")
  defer span.End()

  if _, err := db.NewInsert().Model(order).Exec(ctx); err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
    return err
  }
  return nil
}</code></code></pre><p><strong>Step 4</strong>. We should also record some contextual information with attributes:</p><pre><code><code>import "go.opentelemetry.io/otel"

var tracer = otel.Tracer("app_or_package_name")

func insertOrder(ctx context.Context, order *Order) error {
  ctx, span := tracer.Start(ctx, "insert-order")
  defer span.End()

  if _, err := db.NewInsert().Model(order).Exec(ctx); err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
    return err
  }

  if span.IsRecording() {
        span.SetAttributes(
            attribute.Int64("endorder.id", order.ID),
            attribute.String("endorder.description", order.Description),
        )
    }

  return nil
}</code></code></pre><h1>Let&#8217;s wrap it up</h1><p>I hope you have found this blog post helpful, and if anything, you have some takeaway pointers to use in your observability practices. Let&#8217;s do a quick a list of the most important takeaways from this post:</p><ul><li><p><strong>Use OpenTelemetry for your Tracing!</strong> You will not regret it one bit and most repetable vendors support OpenTel standards, which is fantastic!</p></li><li><p><strong>You probably won&#8217;t need logs as well as traces!</strong> As mentioned in the events section above, you can send event-logs in your span payloads, which means you won&#8217;t need to send log&#8217;s seperately. This is why I will always recommend collaborating inside your engineering team to come up with solid observability patterns, and even wrapper libraries to help with consistency across all of your services and systems.</p></li><li><p><strong>Context</strong> is really important when your request paths traverse many different services and components within your distributed architecture. Always populate the context and remember to check out the supported standards that OpenTel offers!</p></li><li><p><strong>Only instrument key components!</strong> We covered a database insert in our example, which is considered a key component to the service/system.</p></li><li><p><strong>Use Semantic Conventions!</strong> When you add instrumentation to your code, it is important to follow semantic conventions. This means using standardised attribute names, span names, and span tags as all defined within the OpenTel specifications. Doing so ensures consistency and interoperability across different instrumentation libraries and backends.</p></li></ul><p>Thanks again for reading and I hope you&#8217;re looking forward to the next blog post in this series, which will be about centralised observability platforms (also labelled as backends).</p>]]></content:encoded></item><item><title><![CDATA[What is O11y? - Metrics]]></title><description><![CDATA[O11y Serie]]></description><link>https://www.thedevsecopsexpert.com/p/what-is-o11y-metrics</link><guid isPermaLink="false">https://www.thedevsecopsexpert.com/p/what-is-o11y-metrics</guid><dc:creator><![CDATA[Mark Pashby]]></dc:creator><pubDate>Thu, 17 Jul 2025 18:46:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0472b75a-67b6-4683-a594-1901197ea2ec_480x270.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to the second part of my series on observability engineering! The first part can be found <a href="https://thedevsecopsexpert.substack.com/p/what-is-o11y-logs?r=g7rls">here</a>. I hope this series proves helpful to you, and I&#8217;m excited to continue working on the upcoming posts.</p><h2>Introduction</h2><p>I won&#8217;t delve into a full explanation or analogy of observability engineering again (for a refresher, please refer to the first post in the series); however, let&#8217;s quickly recap. Observability engineering involves measuring the internal states of a system or application by examining its outputs. It&#8217;s straightforward &#8211; no secret sauce, and no hocus pocus! In this series, we&#8217;ll cover essential topics within observability engineering that should benefit both newcomers and seasoned engineers alike.</p><p>As always, my strong opinions are based on extensive experience, but I hold them lightly. I take great pride in my ability to change my views when presented with new information or different experiences.</p><h2>Pillar Two - Metrics</h2><h3>What are metrics?</h3><p>First, what are metrics? Metrics are simply the numeric representation of data measured over intervals of time. Using my pilot analogy from the first post, a commercial pilot needs real-time metrics to make informed decisions throughout a flight. This data is crucial for the pilot&#8217;s situational awareness. If it were in log format, the pilot wouldn&#8217;t have time to sift through detailed logs to get the necessary information, which could be disastrous!</p><p>For the aeroplane manufacturer, having historical numeric data points is essential for hypothesis-based investigations. For instance, if a new engine version is introduced, manufacturers might hypothesise that these engines run significantly hotter than previous models. By analysing historical data points, they can mark when the new engines were installed and validate or refute their hypothesis. Metrics might not explain why the engines run hotter, but they provide a clear picture of when and what.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_wWF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_wWF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_wWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1482640,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168579982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_wWF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!_wWF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe281dca0-7622-4011-9196-dcb07b35e8e5_480x270.gif 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>The incredible value of metrics</h2><p>My first piece of advice is, if anyone ever tells you that metrics are unnecessary in modern observability engineering, take a deep breath and firmly disagree. While it might not warrant drastic actions (unless they&#8217;ve committed equally heinous acts like stealing your lunch from the office fridge), it&#8217;s essential to remind them of the immense value numerical data points provide for single-pane-of-glass troubleshooting. Metrics offer an additional data dimension, giving you immediate, data-based insights.</p><p>For instance, imagine a service or application reporting numerous errors early on a Monday morning at 9am. Your service might emit a metric through a framework like <code>python_gc_objects_uncollectable_total</code>, and your cloud provider might collect a <code>memory_utilised</code> metric. These numerical data points are unique to metrics and cannot be easily derived from other observability pillars.</p><p>Now, consider Bob, the senior engineer, strolling into the office at 10am with his frappamochachino iced coffee and a croissant from his favourite bakery. Bob recalls a Python version bump on Friday afternoon, &#8220;but it had been tested and approved,&#8221; he mentions, mid-bite. What if Bob wasn&#8217;t there to provide that critical piece of information? No worries &#8211; you&#8217;re a super observability engineer who thought ahead and emitted a metric indicating the Python version of your service, such as <code>python_version</code>. Brilliant!</p><p>With this metric, you can easily correlate the version bump with the rise in uncollectable garbage collection objects due to increased request throughput on Monday morning. The team can then roll back the Python version, averting any major issues. Disaster mitigated.</p><p>I acknowledge this is a basic example, and something you might also catch in your logs, like SystemOutOfMemory exceptions. However, not all systems or services operate at the same scale. Engineers shouldn&#8217;t have to hunt for a needle in a haystack. So, always strive for clarity and precision in your observability practices.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zEjK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zEjK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 424w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 848w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 1272w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zEjK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png" width="1456" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:48141,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168579982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zEjK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 424w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 848w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 1272w, https://substackcdn.com/image/fetch/$s_!zEjK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3683281b-ebc8-4365-9823-938c7dd40482_1781x475.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3N7Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3N7Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 424w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 848w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 1272w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3N7Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif" width="500" height="280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:280,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3386211,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168579982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3N7Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 424w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 848w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 1272w, https://substackcdn.com/image/fetch/$s_!3N7Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd601d837-f4a9-491c-887f-e4b0122ebb2a_500x280.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Get collecting</h2><p>Great stuff! We have covered how important metrics are, but what is the anatomy of a modern metric?</p><p>First of all, my strong opinion here is that you should pick OpenTelemetry to instrument and collect your metrics. Trust me and save a significant amount of time by making that choice early on. The OpenTel standards for metrics collection are supported by most major centralised observability platforms. We will cover centralised observability platforms in a future blog post.</p><p>The OpenTel metrics data model structure can be found <a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/#opentelemetry-protocol-data-model">here</a>. OpenTelemetry has three models:</p><ul><li><p>The <strong>Event model</strong>, in which you instrument metrics as the software engineer</p></li><li><p>The <strong>Stream model</strong>, which OpenTelemetry uses for transport</p></li><li><p>The <strong>Timeseries model</strong>, which OpenTelemetry uses for storage</p></li></ul><p>The OpenTelemetry metrics are constructed by using the global <strong>MeterProvider</strong> to create a <strong>Meter</strong>, and associating it with one or more instruments. An instrument is a specific type of metric (e.g., a counter, gauge, histogram) that you use to collect data about a particular aspect of your service or application&#8217;s behaviour. You capture measurements by creating instruments that are comprised of:</p><ul><li><p>A unique name, for example, <code>http.proxy.request.duration</code></p></li><li><p>An instrument type, for example, <strong>Histogram</strong></p></li><li><p>An optional unit of measure, for example, <code>milliseconds</code> or <code>bytes</code></p></li><li><p>An optional description for the instrument</p></li></ul><p>A single instrument can produce multiple timeseries. A timeseries is a metric model with a unique set of attributes. For example, let&#8217;s say you have a Kubernetes cluster; each host in the cluster has a separate timeseries for the same metric name.</p><p>It&#8217;s very important to mention additive instruments at this stage. Additive or summable instruments produce timeseries that, when added up together, produce another meaningful and accurate timeseries. Additive instruments that measure non-decreasing numbers are also called <em>monotonic</em>. For example, <code>http.server.requests</code> is an additive timeseries because it can be summed from multiple hosts to get the actual total number of requests from your service if you load balance requests, which you should be doing, of course! There are also synchronous instruments, which are invoked together with the operations they are measuring. For example, +1 to a counter when a request is fired off for your service or application. Lastly, there are also asynchronous instruments, which periodically invoke a callback function to collect measurements. Asynchronous instruments are also known as observers, and observers can be used to periodically measure things like system memory or CPU usage.</p><h2>When to use what type and some examples!</h2><p>Some simple guidance on when to use what:</p><ol><li><p>If you need to measure <code>request_latency</code> or maybe <code>request_size</code>, pick a <strong>Histogram</strong></p></li><li><p>If you need to measure things like <code>processed_requests</code>, <code>errors</code>, <code>received_bytes</code>, <code>disk_reads</code>, pick a <strong>Counter</strong> if the value is monotonic. Otherwise, use <strong>UpDownCounter</strong> as your instrument type</p></li><li><p>If you need to measure things like <code>cpu_time</code>, <code>memory_usage_bytes</code>, <code>memory_utilisation_percentage</code>, if the value is additive/summable and if the value is monotonic, use <strong>CounterObserver</strong>; otherwise, use <strong>UpDownCounterObserver</strong>. Lastly, if the value is <strong>NOT</strong> additive/summable, use the <strong>GaugeObserver</strong> type.</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yI2y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yI2y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yI2y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1723204,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168579982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yI2y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!yI2y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F51d33bbc-ead2-48e9-b379-0b7c10448cdf_480x270.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Lets make this fun by doing a few examples! I&#8217;ve picked golang in the following examples, but there are well supported SDK&#8217;s/API&#8217;s in most popular programming languages. We want to measure the number of requests our startup app sends to social media API&#8217;s, so we should pick a Counter, and increment whenever a request is sent:</p><pre><code><code>import "go.opentelemetry.io/otel/metric"

socialMediaAPIRequestCounter, _ := meter.Int64Counter(
&#9;"some.prefix.api.requests",
&#9;metric.WithDescription("Number of sent Social Media API requests"),
)
// Your secret sauce code does things here
socialMediaAPIRequestCounter.Add(ctx, 1)</code></code></pre><p>simple example, but you can see we are incrementing the counter each time our secret sauce code runs. Lets do another very quick example with a <strong>Histogram</strong>:</p><pre><code><code>import "go.opentelemetry.io/otel/metric"

opHistogram, _ := meter.Int64Histogram(
&#9;"some.prefix.process.image.duration",
&#9;metric.WithDescription("Duration of image enhancement"),
)

t1 := time.Now()
op(ctx)
dur := time.Since(t1)

opHistogram.Record(ctx, dur.Microseconds())</code></code></pre><p>That example is showing the time it takes to do an image enchancement in our CSI: New York image enchancement app.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-0Jp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-0Jp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-0Jp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif" width="480" height="270" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:270,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3026170,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168579982?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-0Jp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 424w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 848w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 1272w, https://substackcdn.com/image/fetch/$s_!-0Jp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8bcb611-4f0f-43ba-bec3-b275ec70ea74_480x270.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br>Lets cover one last example of cache hit rates, because we are using <strong>CounterObserver</strong> which if we remember correctly is used to instrument <em>monotonic</em> numbers that are additive/summable:</p><pre><code><code>import "go.opentelemetry.io/otel/metric"

counter, _ := meter.Int64ObservableCounter("some.prefix.request.cache")

// Arbitrary key/value labels.
hits := []attribute.KeyValue{attribute.String("type", "hits")}
misses := []attribute.KeyValue{attribute.String("type", "misses")}
errors := []attribute.KeyValue{attribute.String("type", "errors")}

if _, err := meter.RegisterCallback(
&#9;func(ctx context.Context, o metric.Observer) error {
&#9;&#9;stats := cache.Stats()

&#9;&#9;o.ObserveInt64(counter, stats.Hits, metric.WithAttributes(hits...))
&#9;&#9;o.ObserveInt64(counter, stats.Misses, metric.WithAttributes(misses...))
&#9;&#9;o.ObserveInt64(counter, stats.Errors, metric.WithAttributes(errors...))

&#9;&#9;return nil
&#9;},
&#9;counter,
); err != nil {
&#9;panic(err)
}</code></code></pre><p><br>We should also note that <strong>CounterObserver</strong> is an asynchronous instrument, and that means it will periodically invoke a callback function to collect measurements. They were very basic examples that I have provided, but hopefully in the future I can link a repository with some better and more fleshed out real-world examples.</p><p>The last step is to choose a backend to send your timeseries metrics to, but as mentioned previously we will cover centralised observability platforms in a later post.</p><h1>Wrap Up Part Deux</h1><p>A recap of the advice and guidance I have given in this post:</p><ul><li><p>Metrics are incredibly useful, and don&#8217;t let anyone tell you otherwise. I can share countless stories on the importance of having dashboarded data points during late-night troubleshooting sessions while on-call.</p></li><li><p>Choose OpenTelemetry to model, transport, and store your metrics. You can&#8217;t go wrong by making this choice early on.</p></li><li><p>Learn the basics of the anatomy of an OpenTel metric, and familiarise yourself with the instrument types to help you make informed decisions on which instruments to pick for your requirements.</p></li></ul><p>Thank you so much for reading, and as always, I hope this post was helpful! Keep an eye out for my next post on Tracing and APM!<br><br></p>]]></content:encoded></item><item><title><![CDATA[What is O11y? - Logs]]></title><description><![CDATA[O11y Series]]></description><link>https://www.thedevsecopsexpert.com/p/what-is-o11y-logs</link><guid isPermaLink="false">https://www.thedevsecopsexpert.com/p/what-is-o11y-logs</guid><dc:creator><![CDATA[Mark Pashby]]></dc:creator><pubDate>Thu, 17 Jul 2025 18:24:04 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!WexX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Welcome to our exploration of observability (o11y)! Whether you&#8217;re a newcomer or a seasoned software engineer looking for a refresher, you&#8217;re in the right place. Grab a cup of tea, settle in, and let&#8217;s dive into the realms of observability, site reliability engineering, alerting, and incident management. This mini-series will break down these complex topics into manageable parts, ensuring a smooth and informative read.</p><h1>Introduction</h1><p>So, what exactly is observability engineering? Simply put, it&#8217;s the ability to monitor, measure, and understand the state of a system or application through its external outputs.</p><p>Let&#8217;s use an analogy to illustrate. Consider a commercial aircraft. An aeronautical engineer relies on data from the aircraft to make informed maintenance decisions, such as monitoring the oil pressure of the jet engines. Pilots, on the other hand, need real-time metrics like altitude and cabin pressure to ensure safe flights. All this data is displayed on a dashboard, providing a comprehensive view of the aircraft&#8217;s performance. Manufacturers also analyse this data to plan improvements. To put it in perspective, a single commercial aircraft can generate up to 20 terabytes of data per engine, per hour of flight &#8211; a staggering amount of information!</p><p>In the same way, observability is crucial in modern software engineering. Mastering it means you can confidently answer how your system or application is performing without constantly checking if it&#8217;s still running. Instead, you&#8217;ll have a robust alerting and incident response platform, giving you peace of mind. After all, you want to make sure you know about an issue before your customers do!</p><p>As for my background, I&#8217;ve been in the observability field for years, working with network devices, server hardware, monolithic applications, micro-services, and event-driven architecture. I&#8217;ve designed and implemented platforms capable of handling tens of thousands of data points per second. My experience spans creating &#8220;observability in a box&#8221; solutions, offering platform-as-a-product services that software engineering teams can easily integrate with. I bring a wealth of knowledge and strong opinions on the subject.</p><h1>The Three Pillars of O11y - Logs</h1><p>Warning, there be strong opionions here.</p><p>What are these three pillars of observability then? Great question, and this isn&#8217;t a conclusion and term that I have coined, this was created by super smart engineers in the observability space. I am only going to be covering logs in this post, because during the draft it was getting quite long and I want to make this a fairly digestable series.</p><h2>Pillar One - Logs/Event-Logs</h2><p>Wait a minute, why Logs and/or Event-Logs? Whats the difference? Dont panic, we will get into that in a minute or two. Essentially, logs are human-readable flat text files that are used by engineers to capture useful data about their systems/services. Log messages occur when a developer deems it important to tell the system or application owner that something happened that they should probably know about. For example, your service could be dropping requests, and you should probably know why sooner rather than later.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WexX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WexX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 424w, https://substackcdn.com/image/fetch/$s_!WexX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 848w, https://substackcdn.com/image/fetch/$s_!WexX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 1272w, https://substackcdn.com/image/fetch/$s_!WexX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WexX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif" width="499" height="499" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:499,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3211182,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168578007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WexX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 424w, https://substackcdn.com/image/fetch/$s_!WexX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 848w, https://substackcdn.com/image/fetch/$s_!WexX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 1272w, https://substackcdn.com/image/fetch/$s_!WexX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6a22b87-975a-4575-aa8c-8c2db2c34707_499x499.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s look at a good example of a log file:</p><p><br><code>[2021-02-23T13:26:23.505892 #22473] INFO -- : [6459ffe1-ea53-4044-aaa3-bf902868f730] Started GET "/" for ::1 at 2021-02-23 13:26:23 -0800</code></p><p><em>Source: The Path from Logs to Traces, by Alex Vondrak</em></p><p>The example log line starts with a timestamp and a PID (Process ID) <code>[2021-02-23T13:26:23.505892 #22473]</code>, which is incredibly important for time-series investigation. If the logging entity has incorrectly configured time, then this data is effectively useless. Precision timing is crucial in modern software engineering. Please accept that as my first piece of sage advice - make sure you write <strong>timezone aware code</strong>, and the systems you host on are all connected to accurate centralised time protocol servers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!T8X2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!T8X2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 424w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 848w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 1272w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!T8X2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif" width="480" height="360" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:480,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:975488,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168578007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!T8X2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 424w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 848w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 1272w, https://substackcdn.com/image/fetch/$s_!T8X2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33dc9387-038a-4801-88eb-16523df6d89a_480x360.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Next up we have the logging level, <code>INFO</code> in the example log file. Logging level basically means the &#8220;importance&#8221; of the log message to the system owner or operator. The logging level should <strong>ALWAYS</strong> be set at the application/service/runtime vars config stage, and the reason for that is that if you have to change it out in the wild for investigative purposes, ie. change to DEBUG level, you don&#8217;t want to have to trawl through code in the early hours of the morning and update everywhere where the logging level is set.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mls1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mls1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 424w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 848w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 1272w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mls1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif" width="330" height="204" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/efa33109-72f3-460f-826a-87829e65c467_330x204.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:204,&quot;width&quot;:330,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2767438,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168578007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mls1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 424w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 848w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 1272w, https://substackcdn.com/image/fetch/$s_!Mls1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fefa33109-72f3-460f-826a-87829e65c467_330x204.gif 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Next up we have something called a Universally unique identifier (uuid) v4 <code>6459ffe1-ea53-4044-aaa3-bf902868f730</code>, which is basically a randomly generated id to represent the request ID. This request ID is important, to help chain events/units of work together for a given request.</p><p>We have a <code>GET</code> request next, which is one of the <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods">HTTP verbs</a>. It is useful to know the HTTP method used when making a request to a service or web app.</p><p>We then see a request path <code>/</code>. In this case, this is the root path of the application/service. We then see <code>::1</code> which is the host network address, an <a href="https://en.wikipedia.org/wiki/IPv6">IPv6</a> localhost address.</p><p>Finally, we have the request start time with the timezone UTC offset <code>2021-02-23 13:26:23 -0800</code>. This is also important to know when the request hit your service or web app, so that you can accurately piece together the events up until a given point.</p><p>This was just purely a random example of how an application or service might log a message. There are many other examples out there where a lot of the formatting choices are relatively sensible out-of-the-box. I give another piece of advice to always find a good logging library for your chosen programming language, unless the standard library logger is really good already. You shouldn&#8217;t want to reinvent the wheel for something like logging, just pick the most popular and easy to use open source solution. If you wan&#8217;t to make modifications, then fork it, and create a new package, but always <a href="https://en.wikipedia.org/wiki/Inner_source">inner source</a> it for your engineering team to encourage internal contributions. For example, the software engineers in your org might want to add standardised fields, or even masking of data before it is sent out on the wire.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bvBF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bvBF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 424w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 848w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 1272w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bvBF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif" width="479" height="352" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:352,&quot;width&quot;:479,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1614286,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://thedevsecopsexpert.substack.com/i/168578007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bvBF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 424w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 848w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 1272w, https://substackcdn.com/image/fetch/$s_!bvBF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6db73fd3-36bb-4f2d-9793-eb3ba16b628c_479x352.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Now, some of the observant amongst you may have remembered that we were going to discuss the difference between logs and event-logs. Don&#8217;t Worry, I haven&#8217;t forgotten! I also don&#8217;t want to cover this topic in too much detail because otherwise the post will bloat quite significantly, but I will link you to some excellent views from the superstar of Observability Engineering, <a href="https://twitter.com/mipsytipsy">Charity Majors</a>. <a href="https://charity.wtf/2019/02/05/logs-vs-structured-events/">Event-logs</a> are just better than logs. Thats right, to the annals of history with you logs! &#8220;But before we stand on that hill with you, Mark, can you tell us what the difference is, please?&#8221; Alright, I hear you. Event-logs are structured logs, and they follow a standardised format (JSON), which makes it trivial to parse and query for most centralised observability platforms. Let&#8217;s take the log example from before and put that into an event-log:</p><pre><code>{
  "name": "request",
  "timestamp": "2021-02-23T13:26:23.505892",
  "pid": 22743,
  "level": "info",
  "request_id": "6459ffe1-ea53-4044-aaa3-bf902868f730",
  "request.method": "GET",
  "request.path": "/",
  "request.ip": "::1",
  "request.start_at": "2021-02-23 13:26:23 -0800"
}</code></pre><p><em>Source: The Path from Logs to Traces, by Alex Vondrak</em></p><p>I think you&#8217;ll agree it&#8217;s even easier to read in it&#8217;s raw format. Lastly on event-logs - always emit a single event per request per service that it hits. Use the <a href="https://opentelemetry.io/docs/specs/otel/logs/event-api/#event-data-model">Open Telemetry conventions</a> and <strong>ALWAYS</strong> fire off an event before the request errors or exits the service, otherwise you have no breadcrumbs for your investigation! Also, if you have distributed systems, include a <code>trace_id</code> to pass onto other services in the stream.</p><h1>Wrap Up</h1><p>Ohh, that isn&#8217;t the last piece of advice regarding logs actually. My last piece of advice is to just avoid putting Personally Identifiable Information (PII) into your logs. You don&#8217;t need it. If you have an anonymized user id in your data model, you <strong>do not</strong> need to log PII. Your security/compliance team will keep their hair, and you will also be way happier and content. But lets say you must absolutely include the event payloads from publisher event buses in your event-logs, then, please please think about your event schemas. I will give a very basic example, but hopefully this highlights the point:</p><pre><code>{
  "name": "request",
  "timestamp": "2021-02-23T13:26:23.505892",
  "pid": 22743,
  "level": "info",
  "request_id": "6459ffe1-ea53-4044-aaa3-bf902868f730",
  "request.method": "GET",
  "request.path": "/",
  "request.ip": "::1",
  "request.start_at": "2021-02-23 13:26:23 -0800"
  "request.payload": {
    "event": {
      "user.data" : {
        "id": "d6279b68-3460-4799-b26d-ea87a865f7fc"
        "private" {
          "full_name": "Joe Bloggs"
          "email": "joe.bloggs@example.com",
          "address": "1 Mount Olympus"
        }
      }
    }
  }
}</code></pre><p>With the field <code>request.payload</code>, and the full field path of <code>request.payload.event.user.data.private</code> I now know everything under that path is private user data, and I can filter/mask that at ingestion time. This is why engineering standardisation is so important in modern engineering orgs. You should agree with your fellow engineering community how to design your schemas to avoid problems later down the line, where mitigating PII data leaking into your observability platforms will be very difficult, and will force you down paths you will not want to go. I am talking reducing data retention periods to 30 days of warm data kinda craziness.</p><p>Let me provide a quick summary of all the sage advice and tips we have covered in this post:</p><ul><li><p>Be smart, be cool, and write timezone aware code that is hosted on infrastructure that have accurate time servers to hand.</p></li><li><p>Always make sure you centralise your logging level settings, at the config or runtime variables stage.</p></li><li><p>Always pick a good logging package or module for your chosen programming language. Open source solutions are perfect, and if you must customise it, then inner source so that you and your colleagues can modify as desired.</p></li><li><p>Event-logs are structured logs and are just way better than bog standard logs. If you had to see some of the regex parsing patterns I have had to create over the years to get logs indexable in observability platforms, you will understand why I stand on that hill.</p></li><li><p>Always emit a single event per request to your service before exiting or erroring, and stream over the wire to your observability platform of choice, using Open Telemetry Standards.</p></li><li><p>Lastly, do not put PII in your logs. If you absolutely must, work with your fellow engineers to standardise your logging structure, and or your event schemas if you run Event Driven Architecture.</p></li></ul><p>I hope I didn&#8217;t miss anything important about logging in the world of observability, but if you feel I did, please tweet me and we can chat (my handle is linked on the left). Thanks for reading!</p>]]></content:encoded></item></channel></rss>