SE Radio 556: Alex Boten on Open Telemetry : Software program Engineering Radio


Alex BotenSoftware program engineer Alex Boten, writer of Cloud Native Observability with Open Telemetry, joins host Robert Blumen for a dialog about software program telemetry and the OpenTelemetry undertaking. After a short overview of the subject and the OpenTelemetry undertaking’s origins rooted within the want for interoperability between telemetry sources and again ends, they talk about the OpenTelemetry server and its options, together with transforms, filtering, sampling, and price limiting. They contemplate a spread of subjects, beginning with various topologies with and with out the telemetry server, server pipelines, and scaling out the server, in addition to an in depth have a look at extension factors and extensions; authentication; adoption; and migration.

Transcript dropped at you by IEEE Software program journal. This transcript was mechanically generated. To recommend enhancements within the textual content, please contact content material@laptop.org and embody the episode quantity and URL.

Robert Blumen 00:00:16 For Software program Engineering Radio. That is Robert Bluman. At the moment I’ve with me Alex Boten. Alex is a senior workers software program engineer at LightStep. Previous to that, he was at Cisco. He’s contributed to open-source tasks within the telemetry space, together with the OpenTelemetry undertaking. He’s the writer of the e book, Cloud Native Observability with OpenTelemetry, and that would be the topic of our dialog as we speak. Alex, welcome to Software program Engineering Radio.

Alex Boten 00:00:50 Good day. Thanks for having me. It’s nice to be right here.

Robert Blumen 00:00:52 Would you want so as to add something about your background that I didn’t point out?

Alex Boten 00:00:57 I feel you captured most of it. I’ve been contributing to OpenTelemetry for slightly bit over three years. I’ve labored on varied parts of the undertaking in addition to the specification, and I’m at present a maintainer on the OpenTelemetry Collector.

Robert Blumen 00:01:11 Nice. Now on Software program Engineering Radio, we’ve got coated numerous telemetry-related points, together with Logging in episode 220, Excessive Cardinality Monitoring, which was 429, Prometheus Distributed Tracing and episode 455, which was known as Software program Telemetry. So, listeners can undoubtedly hearken to a few of these in our again catalog to get extra normal data. We’ll be focusing extra on this dialog about what OpenTelemetry brings to the desk that we’ve got not already coated. Let’s begin out with, within the telemetry house, the place might you situate OpenTelemetry? What’s it just like? What’s it totally different? What downside does it remedy?

Alex Boten 00:02:02 That’s an excellent query. So, I feel the issue that OpenTelemetry goals to unravel — and we’ve already seen it occur within the trade as we speak — is it modifications how utility builders instrument their utility, how telemetry is generated, and the way it’s collected, after which transmitted throughout techniques. And if I had been to consider what it’s just like the very first thing that involves thoughts are the tasks that basically brought on it to emerge, that are OpenCensus and OpenTracing, that are two different open-source tasks that had been fashioned slightly bit earlier. I feel it began in possibly 2017, 2016, to offer a typical round producing distributed tracing. After which additionally OpenCensus additionally addressed slightly bit round metrics and log assortment.

Robert Blumen 00:02:50 What was happening within the telemetry space previous to these tasks that created the necessity for them, and what did they do?

Alex Boten 00:02:57 Yeah, so I feel, in case you consider telemetry because the area in software program, it’s been round for a very very long time, proper? Like, individuals as early because the earliest of laptop scientists wished to know what their computer systems had been doing. And earlier within the days of getting a single machine, it was pretty straightforward to print some log statements and have a look at what your machine was doing. However because the trade grew, because the Web of Issues picked up, as techniques turned bigger and bigger to deal with the growing demand, I feel techniques turned inherently extra complicated. And we’ve seen an evolution of what software program telemetry actually turned. So, in case you consider earlier we had been capable of log knowledge on a single system. As individuals needed to deploy a number of techniques, a necessity for centralized logging got here alongside so to mixture and do mixture searches on logs.

Alex Boten 00:03:54 And that turned actually pricey. After which we noticed a rise in of us desirous to seize extra significant metrics from their techniques the place they may create dashboards and do queries, whereas it was cheaper than going by means of and analyzing log knowledge. And I feel the factor that I’ve seen occur within the final 20 years is each time there was a brand new possibly paradigm round the kind of telemetry that techniques ought to emit, there was an opportunity for innovation to happen, which is nice to see, however in case you’re an finish consumer who’s simply making an attempt to get telemetry out of a system, out of an utility, it’s a very irritating course of to need to go and reinstrument your code each few months or each few years, relying on what the flavour of the day is. And I feel what OpenCensus and OpenTracing and OpenTelemetry tried to seize is addressing the ache that customers have in the case of instrumenting their code.

Robert Blumen 00:04:49 What’s the relationship of OpenTelemetry to different techniques on the market, reminiscent of Zipkin, Jaeger, Graylog, Prometheus?

Alex Boten 00:05:00 So the connection that OpenTelemetry has with the Zipkin, the Jaegers and the Prometheus of the world is absolutely round offering interoperability between these techniques. So, an utility developer would instrument their code utilizing OpenTelemetry, after which they will emit that telemetry knowledge to no matter backend techniques they need. So, in case you wished to proceed utilizing Jaeger, you could possibly undoubtedly do this with an utility that’s instrumented with OpenTelemetry. The opposite factor that OpenTelemetry tries to do is it tries to offer a translation layer so that people which are possibly as we speak emitting knowledge to Zipkin or to Jaeger or to Prometheus can deploy a collector inside their environments after which translate the information from a particular format of these different techniques into the OpenTelemetry format, in order that they will then emit the information to no matter backend they select by merely updating the configuration on their Collector with out having to return to their purposes who could also be legacy techniques that no one needs to change anymore and nonetheless have the ability to ship their knowledge to totally different locations.

Robert Blumen 00:06:06 Is OpenTelemetry then an interoperability commonplace, a system, or each?

Alex Boten 00:06:13 It’s actually the usual to instrument your purposes and to offer the interoperability between the totally different techniques. OpenTelemetry doesn’t supply a backend; there’s no log database or metrics database that OpenTelemetry supplies. Perhaps sooner or later sooner or later that that can occur. We’re actually seeing individuals which are supporting the OpenTelemetry format beginning to present these backend choices for people which are emitting solely OpenTelemetry knowledge. However that’s not one thing the undertaking is all in favour of fixing at this level. It’s actually concerning the instrumentation piece and the gathering and transmission of the information.

Robert Blumen 00:06:52 In studying about this, I got here throughout dialogue of a protocol known as OTLP. Are you able to clarify what that’s?

Alex Boten 00:07:00 So the OpenTelemetry protocol is a protocol that’s generated from protobuf definitions. Each implementation of OpenTelemetry helps its goal is to offer excessive efficiency knowledge transmission in a format that’s standardized throughout all of the implementations. It’s additionally supported by the OpenTelemetry Collector. And what it actually means is, so this format helps all of the totally different indicators that OpenTelemetry helps. So, log traces, metrics, and possibly down the street, occasions and profiling, which is at present being developed within the undertaking. And the concept is in case you assist the OpenTelemetry protocol, that is the protocol that you’d use to both transmit the information, or in case you’re a vendor or in case you’re a backend supplier, you’d use that protocol to obtain the information. And it’s really been actually good to see even tasks like Prometheus beginning to assist the OTLP protocol for transmitting knowledge.

Robert Blumen 00:07:56 So, let me summarize what we’ve got up to now, and you may inform me if I’ve understood. I’m constructing an utility, I might instrument it in a manner that’s suitable with this commonplace. I may not even know the place my logs or metrics are going to finish up. After which whoever makes use of my system, which can be individuals in the identical group or possibly I’m transport an open-source undertaking, which has many customers — they will then plug of their backend of alternative, and they aren’t essentially tied to any choices I made about how I feel the telemetry might be collected. It creates the flexibility of customers to plug and play between the purposes and the backends. Is that kind of appropriate?

Alex Boten 00:08:42 Yeah, that’s precisely proper. I feel it actually decouples the instrumentation piece, which traditionally has been the most costly side of organizations gaining observability inside of their techniques, from the choice of the place am I going to ship that knowledge. And the good factor about that is that it actually frees the top customers from the concept of vendor lock-in, which I feel quite a lot of us who’ve labored in in techniques for a very long time all the time discovered it to be tough. The dialog of making an attempt to possibly check out a brand new vendor in case you wished to check some new function that you just wished to have or no matter, normally would imply that you would need to return and re-instrument your code. Whereas now with OpenTelemetry, when you’ve got instrumented your utility, hopefully that is the final time it’s important to fear about instrumenting your utility as a result of you’ll be able to simply level that knowledge to totally different backends.

Robert Blumen 00:09:34 A short while in the past you probably did point out the Collector, and we might be spending a while on that, however I wish to perceive what are the attainable configurations of the system. What I feel we’re speaking about now’s if the code is instrumented with the OpenTelemetry commonplace, that it might discuss on to backends. The opposite choice being you may have a Collector in between them. Are these the 2 principal configurations?

Alex Boten 00:10:02 Yeah, that’s proper. It’s additionally attainable to configure your instrumented utility to ship knowledge to backends instantly: in case you wished to decide on to ship the information to Jaeger, I feel most implementations that assist OpenTelemetry formally have a Jaeger exporter, for instance. So there are alternatives in case you wished to ship knowledge out of your utility to your backend, however ideally you’d ship that knowledge in a protocol which you can then configure utilizing an OpenTelemetry Collector later down the road.

Robert Blumen 00:10:31 Let’s come again to Collector in a bit, however I wish to discuss instrumentation. Usually if I wish to discuss to a sure backend, I want to make use of their library to emit the telemetry. How does that change with OpenTelemetry?

Alex Boten 00:10:49 Yeah, so with the OpenTelemetry commonplace, you may have two elements of the instrumentation. So, there’s the OpenTelemetry API, which is absolutely what most builders would work together with. There’s a really restricted quantity of floor space that the API covers. For instance, for tracing the APIs, basically you can begin a span and you may end a span and get a tracer. That’s roughly the floor space that’s making an attempt to be coated there. And the concept we wished to push ahead with, with our restricted API, is to only scale back the cognitive load that customers must tackle to undertake OpenTelemetry. The opposite piece of the instrumentation that people must work together with is the SDK, which actually permits finish customers to configure how the telemetry is produced and the place it’s despatched to. Should you’re occupied with this within the context of how is it totally different from specific backend and its instrumentation, the, the distinction is what OpenTelemetry you’d solely ever use the OpenTelemetry API and configure the SDK to ship knowledge to the backend of alternative.

Alex Boten 00:11:55 However the API that you’d use for instrumenting the code wouldn’t be any totally different relying on which backend you ship it to. And there’s that clear separation between the API and the SDK that permits you to actually solely instrument with that minimal interface and fear concerning the particulars of how and the place that knowledge is shipped utilizing the SDK configuration, which in my e book I seek advice from as telemetry pipelines.

Robert Blumen 00:12:17 In that dialogue you talked about tracing, I’ve seen quite a lot of logging techniques, you’ll be able to log no matter you need after which it places the burden on a Collector to choose up the logs and format them. After which metrics, you’ll have to make use of a library. If I’m adopting OpenTelemetry, how does it deal with logs and metrics?

Alex Boten 00:12:40 Yeah, so for metrics, there’s an API that calls out particular devices. So OpenTelemetry has a listing of, I consider it’s six devices at present that it helps to kind of have the identical performance as just like the library. And I feel quite a lot of these devices had been developed in collaboration with each the open metrics and the Prometheus communities to make sure that we’re suitable with these of us. So, for the logging library, that’s slightly bit totally different in OpenTelemetry — or a minimum of it was on the time of writing my e book, which was written in 2021, largely. The concept behind logging and OpenTelemetry was, we already had been conscious there have been so many alternative APIs for logging in every language. Every language has like a dozen logging APIs and we didn’t essentially wish to create a brand new logging API that individuals must undertake. And so, the concept was to essentially hook into these present APIs. It’s been an fascinating transition although. I feel previously, possibly previously six or eight months or so, there’s been nearly an ask for an API and an SDK within the logging sign as properly. That’s nonetheless at present in improvement. So, keep tuned for what’s going to occur there.

Robert Blumen 00:13:51 In what languages are the OpenTelemetry SDKs accessible?

Alex Boten 00:13:57 Yeah, so there’s at present 11 formally supported languages. I’m in all probability going to overlook a few of them, however there’s undoubtedly one in C++, in Go, in Rust, in Python, Ruby, PHP, Java, JavaScript, all these languages are coated formally by OpenTelemetry. And what this implies is that the implementations had been reviewed by somebody on the technical committee, and the implementations themselves dwell throughout the OpenTelemetry group in GitHub and has the identical course of. We now have maintainers and approvers for every a kind of languages. There’s a few further implementations that aren’t formally supported but, however that’s actually simply because there hasn’t been sufficient contributors to it but. So, I feel there’s one in Lua and possibly Julia is the opposite one?

Robert Blumen 00:14:46 I’ve discovered when instrumenting code up and spend quite a lot of time doing issues like writing a message {that a} sure technique has been known as, and listed below are the parameters — very boilerplate steps. I perceive that OpenTelemetry can to some extent automate that? How does that work?

Alex Boten 00:15:08 Yeah, so there’s — one of many very first OTEPs (the OpenTelemetry Enhancement Proposals) that was created within the early levels of the undertaking was to assist to assist auto instrumentation out of the field. So, the hassle of auto instrumentation in several languages is at totally different levels. So, I do know the Java and the Python auto instrumentation efforts are slightly bit additional alongside. I feel .NET is coming alongside properly, and I feel JavaScript is, as properly. However the thought behind auto instrumentation with OpenTelemetry particularly is similar to what we’ve seen in different efforts earlier than the place it actually ties instrumentation to present third celebration open-source library or third celebration libraries. Proper? And the concept being, for instance, in case you’re utilizing the Python SDK — I’m utilizing that for instance as a result of I spent an honest period of time writing some code there.

Alex Boten 00:16:02 Should you’re utilizing the Python SDK and also you wished to make use of, for instance, the Python Redis library, properly you could possibly use the instrumentation library that’s offered by OpenTelemetry, which lets you name to this library, which monkey patches the Redis library that it then makes a name to. However, in that intermediate step, it acts as a center layer that devices the calls to the library that you’d be making. So, in case you had been calling Konnect, for instance, it will name Konnect on the instrumentation library begin span, possibly report some form of metric concerning the operation, make the decision to the Redis library, after which on the return it will finish the span and produce some telemetry there with some semantic conference attributes.

Robert Blumen 00:16:49 Clarify the time period monkey patching.

Alex Boten 00:16:52 So monkey patching is when a library intercepts a name and replaces a name with itself as a substitute of the unique name. So, within the case of the Redis instance I used to be utilizing, the Redis instrumentation library intercepts the decision to connect with Redis, after which it replaces it with its personal join name, which does the instrumentation, as properly.

Robert Blumen 00:17:17 This I might see being very helpful in that in case you’ve acquired a library and one thing’s going fallacious inside the library, I don’t know the place, then the earlier choice has been that I must get the supply code of the library, and if I need logging, I must go and insert log statements or insert metrics or no matter kind of telemetry I’m making an attempt to seize into another person’s supply code and rebuild it. So, does this allow you to get visibility of what’s occurring inside third-party libraries that you just’ve downloaded together with your package deal supervisor and also you’re not all in favour of modifying the code?

Alex Boten 00:17:57 Proper. I feel that’s a key good thing about it’s that you just’re lastly capable of see what these libraries are doing, or possibly you’re not conversant in the code otherwise you’re probably not certain of the trail by means of the code and also you’re capable of see all the library calls which are instrumented on beneath the unique name of your utility, which quite a lot of the time you’ll discover issues there, but it surely’s actually laborious to establish them since you don’t essentially know what’s occurring with out studying the supply code beneath in any respect.

Robert Blumen 00:18:24 I’ve used a few of these languages within the 11. I’m conscious that each language is totally different so far as what entry it offers you to intercept issues at runtime or possibly generate byte code and inject it into the library. I’d suppose that the flexibility to do that goes to vary significantly primarily based on the language, and possibly C++ being reasonably unfriendly to that. Do you count on to realize a parity with all of the languages within the extent which you can supply this function? Or will it all the time work higher on some than others?

Alex Boten 00:19:02 That’s an excellent query. I feel, ideally, I think about that instrumentation libraries are a brief repair. I actually consider that what everyone’s hoping for throughout the group, and we’ve seen some Open Supply tasks already attain out and begin instrumenting their purposes. We’re actually hoping that these libraries will in use the OpenTelemetry API to instrument themselves and take away the necessity for these instrumentation libraries altogether. For instance, if an HTTP server framework had been to instrument its calls to its endpoints utilizing OpenTelemetry, the top consumer wouldn’t even want this instrumentation library. And we might obtain parity throughout all of the languages as a result of every a kind of libraries would simply use the usual reasonably than counting on both byte code manipulation or monkey patching, which it really works for what it’s, but it surely’s not all the time the best choice.

Alex Boten 00:20:01 With monkey patching, possibly the underlying libraries name modifications parameters, and it’s important to maintain observe of these modifications inside these instrumentation libraries. And in order that, that all the time poses a problem. However ideally, like I mentioned, these libraries would, will go away because the undertaking continues to realize traction throughout the trade. And we’ve already seen, I feel there was a couple of Python open-source tasks that reached out. I do know the Spring of us in Java had a undertaking to instrument utilizing OpenTelemetry. Envoy and some different proxies have additionally began utilizing OpenTelemetry. So it’s undoubtedly, I feel in some magician lab we’re nice for the brief time period, however in the long run it will be supreme if issues had been instrumented themselves.

Robert Blumen 00:20:45 That might be nice. However there are all the time going to be some older libraries that possibly not beneath as energetic improvement the place there’s probably not anybody round to change them. You then all the time have this to fall again on in these circumstances. I wouldn’t see it’s going away.

Alex Boten 00:21:02 Proper. Ideally it will, the norm would grow to be instrument your libraries with OpenTelemetry, and for these libraries that aren’t being modified and completely proceed to make use of the mechanisms that we’ve got in place as we speak.

Robert Blumen 00:21:16 Now I feel it’s the time to start out speaking concerning the Collector. We’ve talked concerning the supply and the way this knowledge will get revealed. A short while in the past we talked about you’ll be able to ship instantly knowledge from a writer to a backend or you’ll be able to have a Collector in between. What’s the Collector, what does it do, why would possibly I need one?

Alex Boten 00:21:36 Yeah, so the Collector is a separate course of that might be working inside your atmosphere. It’s a binary that’s revealed as a separate binary, or docker picture in case you’re all in favour of that. There’s additionally packages for, I feel, Debian and RedHat. And the Collector is mostly a vacation spot in your telemetry that may then act as a router. So, it has a collection of, I consider it’s over 100 receivers, which assist totally different codecs and in addition can scrape metric knowledge from totally different techniques. And it has exporters, and once more, I lose observe of it, however I feel it’s over 100 codecs of exporters that the OpenTelemetry Collector helps. So you’ll be able to ship knowledge to it in a single format and export it utilizing a unique format in case you’re so eager on. You too can use processors throughout the Collector, which let you manipulate the information, whether or not or not it’s for issues like redacting, possibly PII that you just may need, or in case you wished to complement the information with some further attributes — possibly about your environments that solely the Collector would learn about.

Alex Boten 00:22:44 And that’s the Collector in a nutshell. It’s accessible to deploy, as I mentioned, as a picture or as a package deal. There’s additionally, you’ll be able to deploy utilizing Helm charts. You’ll be able to deploy utilizing the OpenTelemetry operator in case you’re utilizing a Kubernetes atmosphere.

Robert Blumen 00:22:59 I’m going to delve into a few of these inside parts. I wish to discuss first slightly bit concerning the networking. It may be easier if I’ve N sources and variety of Okay backends, as a substitute of an N cross Okay topology, an N cross 1 and 1 cross Okay. Do you may have any ideas on, is {that a} motivator to simplify your networking and all the things that goes together with that? Is {that a} motivator for adopting a Collector?

Alex Boten 00:23:30 Yeah, I feel so. I feel the Collector makes it very interesting for a wide range of causes. One being that your egress out of your community could solely be coming from one level. So, from a safety auditing form of perspective, you’ll be able to see the place all the information is absolutely going out reasonably than having a bunch of various endpoints that need to be linked to some exterior techniques. I feel from that time alone, it’s undoubtedly price deploying a Collector inside a community. I feel there’s additionally the flexibility to throttle the information that’s going out is essential. When you have N endpoints which are sending knowledge, it’s actually tough to throttle how a lot knowledge is definitely leaving your community, which might find yourself being pricey. So, in case you wished to do issues like sampling, you’d in all probability wish to have a Collector in place, in order that you could possibly actually modify it as wanted.

Robert Blumen 00:24:22 How a lot telemetry can one occasion of Collector deal with?

Alex Boten 00:24:30 Yeah, I imply I feel that all the time is determined by the dimensions of the occasion that you just’re working. They’re on the OpenTelemetry Collector repository. There’s a fairly complete benchmarks which have been run towards the Collector for each traces and logs and metrics. And I consider the occasion sizes that had been used, if reminiscence serves proper, they had been utilizing ECE2 for the testing for the benchmarks. And I consider that’s all listed on the web site there. For folk which are all in favour of discovering out.

Robert Blumen 00:25:01 If I wished to both run extra workload than what I might put by means of one occasion or for high-availability causes, have a clustered implementation with a a number of Collectors, is it attainable say to place a load balancer in entrance of it and distribute it? Or what are the choices for a extra clustered implementation?

Alex Boten 00:25:24 Yeah, so the best way you’d wish to in all probability deploy that is: you’d wish to use some form of load balancer relying on the, the telemetry you’re sending out, it’s possible you’ll wish to use like a routing processor that permits you to be extra particular as to which knowledge every one of many Collectors might be receiving. So for instance, in case you had, possibly a bunch of Collectors which are deployed which are nearer to your purposes, that might then be routed by means of possibly a Collector as a gateway and also you wished to ship solely a sure variety of traces to the Collector as a gateway, you could possibly fork it utilizing the routing processor primarily based on the hint IDs or one thing like that, in case you wished to.

Robert Blumen 00:26:06 So, with stateless servers you’ll be able to arrange a reasonably dumb load balancer and each request would get routed basically to a random occasion. Is there any causes I’ve a bit extra of a sharding or pinning of sure workloads in a clustered implementation?

Alex Boten 00:26:27 I feel a few of this is determined by what you’re doing with the Collectors. So for instance, in case you’re doing sampling on traces, you wouldn’t need your sampling choice being made throughout, like there’s, there’s no approach to share that sampling choice throughout Collectors. And so, you’d need to have the ability to make that call on the identical occasion of the Collector, for instance. And so you’d actually need all the knowledge for a particular hint to go to the identical Collector to have the ability to make the choice on the pattern.

Robert Blumen 00:26:56 You utilize the phrase gateway, which is a typical phrase, however I perceive it means one thing particular in OpenTelemetry the place you may have a gateway mannequin and an agent mannequin. Clarify these two fashions, the distinction between them.

Alex Boten 00:27:11 Yeah, so within the agent deployment for the OpenTelemetry Collector, you’d be working your OpenTelemetry Collector on the identical host or the identical node, possibly as a part of a demon set in Kubernetes. So, you’d have a separate occasion of the Collector for every one of many nodes which are working inside your atmosphere. And you’d have your utility sending knowledge to the native agent earlier than it will then ship it as much as wherever your vacation spot is. Within the gateway deployment mannequin, you’d have the Collector act as a standalone utility, and it will have its personal deployment. Perhaps you’d have one per knowledge heart or possibly one per area. And that might act as possibly the egress out of your community. And that’s form of the gateway deployment.

Robert Blumen 00:28:02 What you described as an agent mannequin that sounds similar to me of what I’ve seen known as sidecar with another companies. Is agent the identical as a sidecar?

Alex Boten 00:28:14 Sure and no. It may be like a sidecar, I feel once I consider a sidecar as, I’d assume that it will be hooked up to each utility that’s working with a sidecar alongside it, which might imply that you just would possibly find yourself with a number of situations of the Collector working on the identical node, for instance, which can be crucial in particular circumstances, or it will not be, it actually is determined by your use case, whether or not or not there’s accessibility out of your utility to the host in any respect. That is determined by what your insurance policies are, how your insurance policies are confined or outlined. So, it may very well be the identical because the sidecar, but it surely doesn’t essentially need to be.

Robert Blumen 00:28:52 Delving extra into the internals of the Collector and what you are able to do, you talked about processors and exporters — and also you’ve coated a few of this earlier than, however why don’t you begin with what are among the main varieties of processors that you just would possibly wish to use?

Alex Boten 00:29:11 Yeah, so I feel that the 2 really helpful processors by the group are the, the batch processor, which tries to take your knowledge and batch it reasonably than sending it each time there’s telemetry coming in. That is making an attempt to optimize among the compression and scale back the quantity of information that will get despatched out. In order that’s one of many really helpful processor. The opposite one is the reminiscence restrict processor, which limits form of the higher certain of reminiscence that you’d enable a Collector to make use of. So you’d in all probability wish to use that within the case of you may have a particular occasion of some kind with some form of reminiscence outlined, you’d wish to configure your reminiscence restrict processor to be under that threshold in order that when the Collector hits that reminiscence restrict, it could begin returning error messages to all of its receivers in order that possibly the senders of the information can go forward and again off on the quantity of information that’s being despatched or one thing like that.

Alex Boten 00:30:02 One of many different processors that’s actually fascinating to many of us is the remodel processor, which let you use the OpenTelemetry Transformation Language to change knowledge. So, possibly you wish to strip some specific attributes, or possibly you wish to change some values inside your telemetry knowledge and you are able to do that with the remodel processor, which remains to be at present beneath improvement. However I feel it early days within the processor there was quite a lot of pleasure round what may very well be achieved with processors. And so, individuals began growing filtering processors and attribute processor for metrics and all these different form of processors that made it slightly bit difficult to know which processors of us needs to be utilizing as a result of there’s so lots of them. And generally, one could assist one sign however not the opposite, whereas the remodel processor actually tries to possibly unify this and to a single processor like that can be utilized to do all of that.

Robert Blumen 00:30:55 You mentioned there’s quite a lot of pleasure round this function. What was it that individuals discovered so thrilling about it?

Alex Boten 00:31:01 Yeah, I feel from the maintainer and contributor standpoint, I feel we had been trying ahead to deprecating among the different processors that may very well be mixed inside a single one. It reduces the, once more, I feel it reduces the cognitive load that individuals need to take care of when ramping up on OpenTelemetry. I feel realizing that if you wish to modify your telemetry, all it’s important to do is use this one processor and, study the language that you’d want to rework the information versus going by means of and looking the repository for 5 or 6 totally different processors. I feel that’s usually nice to consolidate that slightly bit.

Robert Blumen 00:31:39 Inform me extra concerning the language that’s used to do these transforms.

Alex Boten 00:31:43 Yeah, so the OpenTelemetry language for people which are all in favour of discovering the complete definition is it’s all accessible contained in the OpenTelemetry Collector: can journey repository, but it surely actually permits of us to outline in a language that sign agnostic what they want to do with their knowledge. So it permits you to get specific attributes, set specific attributes, and modify knowledge inside your Collector.

Robert Blumen 00:32:09 The opposite inside part of Collectors I wish to spend a while on is exporters. What do these do?

Alex Boten 00:32:17 Yeah, so the exporter take the information that’s been ingested by the OpenTelemetry Collector. So, the OpenTelemetry Collector use receivers to obtain the information in a format that’s particular to whichever receiver is configured. It then transforms the information to inside knowledge format throughout the Collector after which it exports it utilizing whichever exporter is configured. So, the exporter’s job is to take the information, the interior knowledge format, and format it to the specification of the vacation spot of the exporter.

Robert Blumen 00:32:50 Okay. So, what are some examples of various exporters which are accessible?

Alex Boten 00:32:54 Yeah, so there’s a bunch of exporters which are vendor-specific exporters that dwell within the repository as we speak. There’s additionally lots of the open-source tasks have their very own exporters. So, Jaeger has its personal, Prometheus has its personal exporter. There’s a couple of totally different logging choices as properly. Yeah.

Robert Blumen 00:33:12 So knowledge is available in, it goes by means of some variety of processors after which goes out by means of an exporter. Is there an idea of a pipeline that maps the trail that knowledge takes by means of the Collector?

Alex Boten 00:33:26 Yeah, so the most effective place to search out that is actually contained in the Collector configuration. So, the Collector is configured utilizing YAML and on the very essence of it, you’d configure your exporters, your receivers, and your processors, and then you definately would outline the trail by means of these parts within the pipeline part of the configuration, which lets you specify what pipelines you wish to configure for tracing, and for logs, and for metrics to undergo to the Collector. So, you’d configure your receivers there, after which your processors, after which your exporters inside every a kind of definitions. And you may configure a number of pipelines for every sign, giving them particular person names.

Robert Blumen 00:34:07 And the way does incoming knowledge choose or get mapped onto a specific pipeline?

Alex Boten 00:34:14 Yeah, so the best way that the information can be mapped on every pipeline is by way of the precise receiver that’s used to obtain the information. So for instance, in case you’ve configured a Jaeger receiver on one pipeline and a Zipkin exporter on a unique pipeline and also you’re sending knowledge by means of Zipkin, then the pipeline that has the Zipkin endpoint can be the vacation spot of that knowledge, after which that’s the pipeline that the information would undergo.

Robert Blumen 00:34:40 So, does every endpoint pay attention on a unique port or does it have a path or what’s the mapping?

Alex Boten 00:34:47 Yeah, in order that is determined by the precise receiver. So, some receivers have the flexibility to configure totally different paths; some solely configure totally different ports. It additionally is determined by the protocol that you just’re utilizing for the receiver and whether or not it helps it or not. And as I discussed, there’s additionally these items generally known as scrapers, that are receivers that may exit and scrape totally different endpoints for metrics, for instance. And people will also be configured as receivers, which might then take their very own path to the Collector.

Robert Blumen 00:35:17 I feel we’ve been largely speaking about beneath the idea of a push mannequin, however this scraper sounds prefer it additionally helps pull. Did I perceive that appropriately?

Alex Boten 00:35:28 Yeah, that’s appropriate. And, in case you consider the Prometheus receiver, for instance, the Prometheus receiver makes use of the pull mannequin as properly. So, you’d outline the targets that you just want to scrape, after which the information might be pulled into the Collector versus pushed to the Collector.

Robert Blumen 00:35:43 So to wrap this all up, then I’d instrument or configure my sources to level them towards the OTel Collector or Collectors. My community, they might have a website identify or an IP tackle and a port and possibly a path that comes after that. They’re instrumented, they push knowledge out, it goes to the Collector, the Collector will course of it after which export it again into backend of alternative. Is {that a} good description of the entire course of?

Alex Boten 00:36:17 Yeah, that’s precisely proper.

Robert Blumen 00:36:18 How do the sources authenticate themselves to the Collector?

Alex Boten 00:36:23 Yeah, so for authenticating to the OpenTelemetry Collector, there’s a number of extensions which are accessible for authentication. So, there’s OIDC authentication extension, there’s the bear token authentication extension. You too can use the essential Auth extension in case you’d like. So, there’s a couple of totally different accessible extensions for that.

Robert Blumen 00:36:43 Yeah, okay. Effectively, let’s discuss extensions. So, what are the extension factors which are supplied?

Alex Boten 00:36:49 Yeah, so extensions are basically parts within the Collector that don’t essentially have something to do with the pipeline of the telemetry going by means of the Collector. And so, among the extensions which are accessible are the pprof extension, which lets you get profiling knowledge out of the Collector. There’s the well being examine extension, which lets you run well being checks towards the Collector, and there’s a couple of different ones which are all accessible within the Collector repositories.

Robert Blumen 00:37:20 Okay. So, we’ve just about coated most of what I had deliberate about what it does, the way it works. Suppose you may have a undertaking that has not been constructed with this in thoughts and is all in favour of migrating. What’s a attainable migration path to OTel from a undertaking which may have been constructed a number of years in the past earlier than this was accessible?

Alex Boten 00:37:45 I’d say the primary path that I’d advocate to of us is absolutely to consider is there a manner that I can drop in a Collector and obtain knowledge within the format that’s already possibly being emitted by an utility. That’s actually the very first step that I’d recommend taking. I do know that there’s a couple of totally different mechanisms for accumulating telemetry that predate the Collector. So, telegraph is an instance of a kind of. When you have telegraph working in your atmosphere and also you’re all in favour of seeing in case you can join it to the Collector, possibly that’s a very good place to start out is, to take a look at connecting the 2. And I do know Telegraph, for instance, emits OTLP, in order that’s already one thing that’s considerably supported. In order that’s actually step one I’d take is can I simply get away with dropping in a Collector and emitting a format that’s possibly already supported?

Alex Boten 00:38:30 One factor to notice is when you’ve got a format on the market that’s not at present supported within the Collector, you’ll be able to all the time go to the group and ask, ‘hey, is that this a part that people are all in favour of in adopting?’ And that’s all the time a very good avenue to form of tackle. Should you’ve acquired dedication out of your group to possibly change the instrumentation libraries that you just’re utilizing inside your code, then nice. I’d begin taking a look at sources. I do know there’s a couple of totally different use circumstances which have been documented, I feel on OpenTelemetry.io round migrating away from both OpenTracing or OpenCensus. So, I’d undoubtedly begin on the lookout for these sources.

Robert Blumen 00:39:07 So we’ve talked concerning the historical past and what it does, what’s on the roadmap?

Alex Boten 00:39:12 Yeah, so on the roadmap for OpenTelemetry, which we really very not too long ago revealed. So, up till earlier this 12 months there wasn’t an official roadmap revealed by the group. However we’re lastly beginning to change the method slightly bit to attempt to actually focus the efforts of the group. So, at present on the roadmap we’ve got 5 tasks which are occurring. So, among the work is being achieved round each client-side instrumentation, so both, net browser-based or cell shoppers, and round profiling. So, that is profiling knowledge being emitted both utilizing an present format, however there’s some dialogue round whether or not or not there’s going to be an extra sign known as profiles to OpenTelemetry. There’s additionally quite a lot of effort being put into making an attempt to stabilize semantic conventions. So, in case you’ve seen the semantic conventions contained in the OpenTelemetry specification, you’ll in all probability know that quite a lot of them are marked as experimental.

Alex Boten 00:40:10 And that’s simply because we haven’t had the possibility of actually focus the group on making an attempt to come back to settlement on what steady Semantic conventions ought to seem like. So, there’s quite a lot of effort to herald specialists in every one of many domains to make sure that they make sense. The opposite efforts that I’m enthusiastic about, as a result of I’m a part of the work, is to place collectively a configuration layer for OpenTelemetry as a complete in order that customers can configure utilizing some form of configuration file, take that configuration file throughout any implementation, and know that the identical outcomes will happen. So, for instance, in case you’re configuring your Jaeger exporter in Python, utilizing this configuration format you’d have the ability to take that very same configuration to your .NET implementation or Java and never have to write down code manually to translate that configuration. After which, there’s some effort round perform as a service assist from OpenTelemetry. So, the group is at present centered round lambdas as a result of that’s the primary serverless or perform as a service mannequin that’s come to us. However there’s additionally effort to herald of us from Azure and GCP as properly. To form of spherical that out.

Robert Blumen 00:41:19 We’re at time, we’ve coated all the things. The place can listeners discover your e book?

Alex Boten 00:41:25 Yeah, so you could find a e book on Amazon. You too can purchase instantly from Packet Publishing. And yeah, it’s additionally accessible at your native bookstores.

Robert Blumen 00:41:35 If customers want to discover your presence wherever on the web, the place ought to they give the impression of being?

Alex Boten 00:41:40 Yeah, to allow them to, they will discover me on LinkedIn slightly bit on Mastadon or on Twitter — although not as a lot anymore. And so they can discover me on the Slack channels for the CNCF Slack occasion. I’m fairly energetic there.

Robert Blumen 00:41:55 Alex Boten, thanks very a lot for talking to Software program Engineering Radio.

Alex Boten 00:41:59 Yeah, thanks very a lot. It’s been nice.

Robert Blumen 00:42:01 This has been Robert Blumen for Software program Engineering Radio. Thanks for listening. [End of Audio]

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles