Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

X-Ray exporter implementation #4

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mrdziuban
Copy link

Closes #3 and trace4cats/trace4cats#118.

Adds initial implementation of an exporter that sends spans to AWS X-Ray. Some notes and questions:

  • I'm using fs2.io.net.DatagramSocket for writing to the UDP socket, but I'm open to changing this if there's a different library you'd prefer
  • X-Ray annotation keys can't have periods, so the keys used by SemanticTags.<kind|status>Tags aren't valid. For now I've just replaced periods with underscores
  • I'm not sure what the best way is to write tests for the exporter. Unfortunately LocalStack's X-Ray support requires LocalStack Pro, so maybe the best option would be to use fs2's DatagramSocket#reads to verify that data is sent properly?

@catostrophe catostrophe added the enhancement New feature or request label Apr 11, 2022
@catostrophe
Copy link
Member

Thanks! Will check tomorrow

@catostrophe
Copy link
Member

Matt, first of all, thanks for the contribution!

I went through your implementation and found a suspicious logic: you generate and use your own random xrayTraceId instead of using the one from CompletedSpan. In this case you cannot propagate trace context across service boundaries via HTTP headers -- you will lose the link between parts of a distributed trace.

I thought and realized that we had a design flaw in the model module. Instead of using Random[F] for generating both TraceIds and SpanIds, we need to introduce new typeclasses, e.g. TraceIdGen and SpanIdGen. And we shouldn't autoderive them via Sync as we do now. Instead, we should propagate F[_] : TraceIdGen : SpanIdGen context bound to SpanContext to Span itself to EntryPoint. So that when one creates an EntryPoint, they provide instances for TraceIdGen and SpanIdGen. By default the generators may be based on Random, but in case of AWS X-Ray it may have a different implementation via Clock + Random. TraceId/SpanId will stay the same byte-arrays (their size may be dynamic). Also their constructors and factory methods must be reviewed and changed.

I did a draft POC in the trace4cats repo (not pushed yet), but still need some time to think on ergonomics for end users.

I also need @janstenpickle's opinion on what I am proposing.

@catostrophe
Copy link
Member

P.S. And this PR should also include ToHeaders instance for X-Ray.

@janstenpickle
Copy link

Thank you Matt, this is great!

RE the new typeclasses for ID generation, yes I completely agree. Having specific, derivable typeclasses for this makes a lot of sense.

RE testing: it's a shame that LocalStack doesn't support X-Ray, but I agree with the best effort apporach of listening for the data.

@catostrophe
Copy link
Member

I spent some more time thinking on the changes in the trace4cats repo. There are a few things that make my first idea not quite viable:

  1. Our model implies that TraceId is a universal entity that fits any tracing standard t4c integrates with. That is, once generated, a traceId must be correct for and exportable to any system, be it OT, Datadog, Zipkin, X-Ray, <you name it>.
  2. It is EntryPoint that creates the root span (and hence generates a traceId).
  3. Exporters are combinable (there's a Monoid instance). That is, a combined SpanExporter may send spans to many systems and we don't know to which. SpanExporter is passed to the EntryPoint factory method.

Given these constraints, if we even try to make generation of TraceId customizable at the level of construction of EntryPoint, we can't guarantee that the SpanExporter instance passed to it is compatible with this custom TraceId generator.

What we still can do:

  1. Make the proposed changes regarding removing the Random context bound and introducing ***Gen typeclasses instead. Add ***Gen bounds to all higher levels discussed above.
  2. Fix the universal traceId internal format as a 16-byte array, as it is now. It can't be of arbitrary size, since it's universal.
  3. Provide a TraceIdGen instance that generates 16-byte traceIds via Clock+Random - this will satisfy all existing exporters, since all systems use 16-byte traceIds, including X-Ray that just requires first 4 bytes to be epoch time rather than random

It is still not clear, when and how we should provide alternative generators and how to make sure they are compatible with the exporters.

Also, if tomorrow we want to integrate with a tracing system that uses 100-char alphanumeric traceIds, we won't be able to do it, unfortunately.

@catostrophe
Copy link
Member

catostrophe commented Apr 13, 2022

With the proposed changes, the old code written for previously supported systems will be source compatible.

Users of X-Ray will have to import a custom instance to the local scope that will be passed to EntryPoint.apply. This also ca be done via a custom object with apply that uses the special traceId generator.

@catostrophe
Copy link
Member

@mrdziuban let's wait for the changes in t4c-core. We will cut a snapshot release and you can proceed with it.

@catostrophe
Copy link
Member

You can use v0.13.1+12-491cff7e

@catostrophe catostrophe marked this pull request as draft April 18, 2022 06:59
@catostrophe
Copy link
Member

With the recent changes in trace4cats, the generator for TraceId in trace4cats-xray may look as follows:

object xray extends XRayCompatInstances0

trait XRayCompatInstances0 extends XRayCompatInstances1 {
  implicit def fromRandomClockTraceIdGen[F[_]: Apply: Random: Clock]: TraceId.Gen[F] = TraceId.Gen.from {
    val timeSize = Integer.BYTES
    val rndSize = TraceId.size - timeSize
    (Clock[F].realTime, Random[F].nextBytes(rndSize)).mapN { (t, r) =>
      val bytes = ByteBuffer.allocate(TraceId.size).putInt(t.toSeconds.toInt).put(r).array
      TraceId.unsafe(bytes)
    }
  }
}

trait XRayCompatInstances1 { this: XRayCompatInstances0 =>
  implicit def fromSyncTraceIdGen[F[_]: Sync]: TraceId.Gen[F] = {
    implicit val rnd: Random[F] = Random.javaUtilConcurrentThreadLocalRandom[F]
    fromRandomClockTraceIdGen[F]
  }
}

// maybe use different name
def toHex(traceId: TraceId): String = {
  val timeSize = Integer.BYTES
  val rndSize = TraceId.size - timeSize
  val arr = traceId.value
  val t = new String(Hex.encodeHex(arr, 0, timeSize, true))
  val r = new String(Hex.encodeHex(arr, timeSize, rndSize, true))
  s"1-$t-$r"
}

@catostrophe
Copy link
Member

Please use v0.13.1+16-78f06e87

@catostrophe
Copy link
Member

@mrdziuban do you have any blockers that prevent you from proceeding with this PR?


import scala.util.Try

case class XRayExceptionId private (value: Array[Byte]) extends AnyVal {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copied TraceId from model

private def toEpochSeconds(i: Instant): Double =
ChronoUnit.MICROS.between(Instant.EPOCH, i).toDouble / 1000_000

private def spanStatusFaultJson[F[_]: Applicative: XRayExceptionId.Gen](status: SpanStatus): F[JsonObject] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires XRayExceptionId.Gen (which is autoderivable) instead of Random

JsonObject(
"name" := goodName,
"id" := span.context.spanId.show,
"trace_id" := xRayTraceIdExportShow.show(span.context.traceId),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly use an alternative Show instance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use traceId direct from the span

* 1. the first 4 bytes are epoch time;
* 1. the rest bytes are random.
*/
def xRayTraceIdGen[F[_]: Apply: Random: Clock]: TraceId.Gen[F] = TraceId.Gen.from {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative generator of traceId that is compatible with the rest trace4cats integrations.

@mrdziuban
Copy link
Author

@catostrophe I don't have any blockers other than time, work and life got in the way of me being able to commit time to this. Feel free to take it from here if you'd like, otherwise I can circle back to it when I have time (though I can't say for sure when that might be)

@catostrophe
Copy link
Member

@mrdziuban I just pushed some changes related to traceId generation. Next, I will ask you some questions regarding the implementation. Take your time, we're not in a hurry.

private def json[F[_]: Monad: XRayExceptionId.Gen](
process: Option[TraceProcess],
span: CompletedSpan,
childSpans: Ref[F, Map[TraceId, NonEmptyList[CompletedSpan]]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I saw in the docs, child spans can be sent prior to the parent if they are typed as "subsegment". Is it possible when the parent is not yet sent, even as "in_progress"?

/** Initialize a `Deferred` to hold the target `IpAddress` so we only need to resolve it at most once for
* each call to `exportBatch`
*/
Deferred[F, IpAddress].flatMap { ipRef =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to resolve IpAddress every time we send another batch? This is not what we do in other implementations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement AWS X-Ray Exporter
3 participants