-
Notifications
You must be signed in to change notification settings - Fork 168
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release 2.0.0/1.63.0. This PR is created from CI and is part of the release process. Signed-off-by: jaegertracingbot <[email protected]> Co-authored-by: jaegertracingbot <[email protected]>
- Loading branch information
1 parent
4625086
commit 5d1dddf
Showing
108 changed files
with
15,276 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
--- | ||
title: Introduction | ||
weight: 1 | ||
children: | ||
- title: Features | ||
url: features | ||
--- | ||
|
||
Welcome to Jaeger's documentation portal! Below, you'll find information for beginners and experienced Jaeger users. | ||
|
||
If you cannot find what you are looking for, or have an issue not covered here, we'd love to [hear from you](/get-in-touch). | ||
|
||
If you are new to distributed tracing, please take a look at the [Related Links](#related-links) section below. | ||
|
||
## About | ||
|
||
Jaeger is a distributed tracing platform released as open source by [Uber Technologies][ubeross]. | ||
With Jaeger you can: | ||
|
||
* Monitor and troubleshoot distributed workflows | ||
* Identify performance bottlenecks | ||
* Track down root causes | ||
* Analyze service dependencies | ||
|
||
Uber published a blog post, [Evolving Distributed Tracing at Uber](https://eng.uber.com/distributed-tracing/), where they explain the history and reasons for the architectural choices made in Jaeger. [Yuri Shkuro](https://shkuro.com), creator of Jaeger, also published a book [Mastering Distributed Tracing](https://shkuro.com/books/2019-mastering-distributed-tracing/) that covers in-depth many aspects of Jaeger design and operation, as well as distributed tracing in general. | ||
|
||
## Features | ||
|
||
* [OpenTracing](https://opentracing.io/)-inspired data model | ||
* [OpenTelemetry](https://opentelemetry.io/) compatible | ||
* Multiple built-in storage backends: Cassandra, Elasticsearch and in-memory | ||
* Community supported external storage backends via the gRPC plugin: [ClickHouse](https://github.com/jaegertracing/jaeger-clickhouse) | ||
* System topology graphs | ||
* Adaptive sampling | ||
* Service Performance Monitoring (SPM) | ||
* Post-collection data processing | ||
|
||
See [Features](./features/) page for more details. | ||
|
||
## Technical Specs | ||
|
||
* Backend components implemented in Go | ||
* React/Javascript UI | ||
* Supported storage backends: | ||
* [Cassandra 3.4+](./deployment/#cassandra) | ||
* [Elasticsearch 7.x, 8.x](./deployment/#elasticsearch) | ||
* [Badger](./deployment/#badger---local-storage) | ||
* [Kafka](./deployment/#kafka) - as an intermediate buffer | ||
* memory storage | ||
* Custom backends via [Remote Storage API](./deployment/#remote-storage) | ||
|
||
## Quick Start | ||
|
||
See [Getting Started](./getting-started). | ||
|
||
## Screenshots | ||
|
||
### Traces View | ||
[![Traces View](/img/traces-ss.png)](/img/traces-ss.png) | ||
|
||
### Trace Detail View | ||
[![Detail View](/img/trace-detail-ss.png)](/img/trace-detail-ss.png) | ||
|
||
### Service Performance Monitoring View | ||
![Service Performance Monitoring](/img/frontend-ui/spm.png) | ||
|
||
## Related links | ||
- [Take Jaeger for a HotROD ride](https://medium.com/jaegertracing/take-jaeger-for-a-hotrod-ride-233cf43e46c2) (blog) | ||
- [Evolving Distributed tracing At Uber Engineering](https://eng.uber.com/distributed-tracing/) (blog) | ||
- [Mastering Distributed Tracing](https://shkuro.com/books/2019-mastering-distributed-tracing/) (book) | ||
- [OpenTracing Tutorial (Java, Go, Python, Node.js, C#)](https://github.com/yurishkuro/opentracing-tutorial/) (tutorials) | ||
- [Learning Distributed Tracing 101](https://tracing.cloudnative101.dev/docs/index.html) (tutorials) | ||
- [Tracing HTTP request latency in Go with OpenTracing](https://medium.com/opentracing/tracing-http-request-latency-in-go-with-opentracing-7cc1282a100a) (blog) | ||
- [Distributed Tracing with Jaeger & Prometheus on Kubernetes](https://blog.openshift.com/openshift-commons-briefing-82-distributed-tracing-with-jaeger-prometheus-on-kubernetes/) (blog) | ||
- [Using Jaeger with Istio](https://istio.io/latest/docs/tasks/observability/distributed-tracing/jaeger/) (docs) | ||
- [Using Jaeger with Envoy](https://www.envoyproxy.io/docs/envoy/latest/start/sandboxes/jaeger_tracing.html) (docs) | ||
|
||
[ubeross]: http://uber.github.io |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,128 @@ | ||
--- | ||
title: APIs | ||
hasparent: true | ||
--- | ||
|
||
Jaeger components implement various APIs for saving or retrieving trace data. | ||
|
||
The following labels are used to describe API compatibility guarantees. | ||
|
||
* **stable** - the API guarantees backwards compatibility. If breaking changes are going to be made in the future, they will result in a new API version, e.g. `/api/v2` URL prefix or a different namespace in the IDL. | ||
* **internal** - the APIs are intended for internal communications between Jaeger components and are not recommended for use by external components. | ||
* **deprecated** - the APIs that are only maintained for legacy reasons and will be phased out in the future. | ||
|
||
Since Jaeger v1.32, **jaeger-collector** and **jaeger-query** Service ports that serve gRPC endpoints enable [gRPC reflection][grpc-reflection]. Unfortunately, the internally used `gogo/protobuf` has a [compatibility issue][gogo-reflection] with the official `golang/protobuf`, and as a result only the `list` reflection command is currently working properly. | ||
|
||
## Span reporting APIs | ||
|
||
**jaeger-collector** is the component of the Jaeger backend that can receive spans. At this time it supports two sets of non-overlapping APIs. | ||
|
||
### OpenTelemetry Protocol (stable) | ||
|
||
Since v1.35, the Jaeger backend can receive trace data from the OpenTelemetry SDKs in their native [OpenTelemetry Protocol (OTLP)][otlp]. It is no longer necessary to configure the OpenTelemetry SDKs with Jaeger exporters, nor deploy the OpenTelemetry Collector between the OpenTelemetry SDKs and the Jaeger backend. | ||
|
||
The OTLP data is accepted in these formats: (1) binary gRPC, (2) Protobuf over HTTP, (3) JSON over HTTP. For more details on the OTLP receiver see the [official documentation][otlp-rcvr]. Note that not all configuration options are supported in **jaeger-collector** (see `--collector.otlp.*` [CLI Flags](../cli/#jaeger-collector)), and only tracing data is accepted, since Jaeger does not store other telemetry types. | ||
|
||
| Port | Protocol | Endpoint | Format | ||
| ----- | ------- | ------------ | ---- | ||
| 4317 | gRPC | n/a | Protobuf | ||
| 4318 | HTTP | `/v1/traces` | Protobuf or JSON | ||
|
||
[otlp-rcvr]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/otlpreceiver/README.md | ||
[otlp]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md | ||
|
||
### Protobuf via gRPC (stable) | ||
|
||
**Deprecated**: we recommend the OpenTelemetry protocol. | ||
|
||
Since Jaeger v1.11, the official protocol between user applications and **jaeger-collector**s is `jaeger.api_v2.CollectorService` gRPC endpoint defined in [collector.proto] IDL file. The same endpoint can be used to submit trace data from SDKs directly to **jaeger-collector**. | ||
|
||
### Thrift over HTTP (stable) | ||
|
||
**Deprecated**: we recommend the OpenTelemetry protocol. | ||
|
||
The payload in [jaeger.thrift] format can be submitted in an HTTP POST request to the `/api/traces` endpoint, for example, `https://jaeger-collector:14268/api/traces`. The `Batch` struct needs to be encoded using Thrift's `binary` encoding, and the HTTP request should specify the content type header: | ||
|
||
``` | ||
Content-Type: application/vnd.apache.thrift.binary | ||
``` | ||
|
||
### JSON over HTTP (n/a) | ||
|
||
There is no official Jaeger JSON format that can be accepted by **jaeger-collector**. | ||
Jaeger does accept the OpenTelemetry protocol via JSON (see [above](#opentelemetry-protocol-stable)). | ||
|
||
### Zipkin Formats (stable) | ||
|
||
**jaeger-collector** can also accept spans in several Zipkin data formats, namely JSON v1/v2 and Thrift. **jaeger-collector** needs to be configured to enable Zipkin HTTP server, e.g. on port 9411 used by Zipkin collectors. The server enables two endpoints that expect POST requests: | ||
|
||
* `/api/v1/spans` for submitting spans in Zipkin JSON v1 or Zipkin Thrift format. | ||
* `/api/v2/spans` for submitting spans in Zipkin JSON v2. | ||
|
||
## Trace retrieval APIs | ||
|
||
Traces saved in the storage can be retrieved by calling **jaeger-query** Service. | ||
|
||
### gRPC/Protobuf (stable) | ||
|
||
The recommended way for programmatically retrieving traces and other data is via the `jaeger.api_v2.QueryService` gRPC endpoint defined in [query.proto] IDL file. In the default configuration this endpoint is accessible from `jaeger-query:16685`. | ||
|
||
### HTTP JSON (internal) | ||
|
||
Jaeger UI communicates with **jaeger-query** Service via JSON API. For example, a trace can be retrieved via a GET request to `https://jaeger-query:16686/api/traces/{trace-id-hex-string}`. This JSON API is intentionally undocumented and subject to change. | ||
|
||
## Remote Storage API (stable) | ||
|
||
When using the `grpc` storage type (a.k.a. [remote storage](../deployment/#remote-storage)), Jaeger components can use custom storage backends as long as those backends implement the gRPC [Remote Storage API][storage.proto]. | ||
|
||
## Remote Sampling Configuration (stable) | ||
|
||
This API supports Jaeger's [Remote Sampling](../sampling/#remote-sampling) protocol, defined in the [sampling.proto] IDL file. | ||
|
||
**jaeger-collector** implements this API. See [Remote Sampling](../sampling/#remote-sampling) for details on how to configure the Collector with sampling strategies. | ||
|
||
The following table lists different endpoints and formats that can be used to query for sampling strategies. The official HTTP/JSON endpoints use standard [Protobuf-to-JSON mapping](https://developers.google.com/protocol-buffers/docs/proto3#json). | ||
|
||
Component | Port | Endpoint | Format | Notes | ||
--------- | ----- | ----------------- | --------- | ----- | ||
Collector | 14268 | `/api/sampling` | HTTP/JSON | Recommended for most SDKs | ||
Collector | 14250 | [sampling.proto] | gRPC | For SDKs that want to use gRPC (e.g. OpenTelemetry Java SDK) | ||
|
||
**Examples** | ||
|
||
Run all-in-one in one terminal: | ||
```shell | ||
$ go run ./cmd/all-in-one \ | ||
--sampling.strategies-file=cmd/all-in-one/sampling_strategies.json | ||
``` | ||
|
||
Query the endpoint in another terminal: | ||
```shell | ||
$ curl "http://localhost:14268/api/sampling?service=foo" | ||
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":1}} | ||
``` | ||
|
||
## Service dependencies graph (internal) | ||
|
||
Can be retrieved from**jaeger-query** Service at `/api/dependencies` endpoint. The GET request expects two parameters: | ||
|
||
* `endTs` (number of milliseconds since epoch) - the end of the time interval | ||
* `lookback` (in milliseconds) - the length the time interval (i.e. start-time + lookback = end-time). | ||
|
||
The returned JSON is a list of edges represented as tuples `(caller, callee, count)`. | ||
|
||
For programmatic access to the service graph, the recommended API is gRPC/Protobuf described above. | ||
|
||
## Service Performance Monitoring (internal) | ||
|
||
Please refer to the [SPM Documentation](../spm#api) | ||
|
||
[jaeger-idl]: https://github.com/jaegertracing/jaeger-idl/ | ||
[jaeger.thrift]: https://github.com/jaegertracing/jaeger-idl/blob/main/thrift/jaeger.thrift | ||
[sampling.thrift]: https://github.com/jaegertracing/jaeger-idl/blob/main/thrift/sampling.thrift | ||
[collector.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/collector.proto | ||
[query.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/query.proto | ||
[sampling.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/sampling.proto | ||
[grpc-reflection]: https://github.com/grpc/grpc-go/blob/master/Documentation/server-reflection-tutorial.md#enable-server-reflection | ||
[gogo-reflection]: https://jbrandhorst.com/post/gogoproto/#reflection | ||
[storage.proto]: https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/grpc/proto/storage.proto |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
--- | ||
title: Architecture | ||
weight: 3 | ||
children: | ||
- title: APIs | ||
url: apis | ||
- title: Sampling | ||
url: sampling | ||
--- | ||
|
||
## Terminology | ||
|
||
Jaeger represents tracing data in a data model inspired by the [OpenTracing Specification](https://github.com/opentracing/specification/blob/master/specification.md). The data model is logically very similar to [OpenTelemetry Traces](https://opentelemetry.io/docs/concepts/signals/traces/), with some naming differences: | ||
|
||
| Jaeger | OpenTelemetry | Notes | | ||
| -------------------- | --------------- | ----------------------------------------------------------------------- | | ||
| Tags | Attributes | Both support typed values, but nested tags are not supported in Jaeger. | | ||
| Span Logs | Span Events | Point-in-time events on the span recorded in a structured form. | | ||
| Span References | Span Links | Jaeger's Span References have a required type (`child-of` or `follows-from`) and always refer to predecessor spans; OpenTelemetry's Span Links have no type, but allow attributes. | | ||
| Process | Resource | A struct describing the entity that produces the telemetry. | | ||
|
||
### Span | ||
|
||
A **span** represents a logical unit of work that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships. | ||
|
||
![Traces And Spans](/img/spans-traces.png) | ||
|
||
### Trace | ||
|
||
A **trace** represents the data or execution path through the system. It can be thought of as a directed acyclic graph of spans. | ||
|
||
### Baggage | ||
|
||
**Baggage** is arbitrary user-defined metadata (key-value pairs) that can be attached to distributed context and propagated by the tracing SDKs. See [W3C Baggage](https://www.w3.org/TR/baggage/) for more information. | ||
|
||
## Architecture | ||
|
||
Jaeger can be deployed either as an **all-in-one** binary, where all Jaeger backend components | ||
run in a single process, or as a scalable distributed system. There are two main deployment options discussed below. | ||
|
||
### Direct to storage | ||
|
||
In this deployment the collectors receive the data from traced applications and write it directly to storage. The storage must be able to handle both average and peak traffic. Collectors use an in-memory queue to smooth short-term traffic peaks, but a sustained traffic spike may result in dropped data if the storage is not able to keep up. | ||
|
||
Collectors are able to centrally serve sampling configuration to the SDKs, known as [remote sampling mode](../sampling/#remote-sampling). They can also enable automatic sampling configuration calculation, known as [adaptive sampling](../sampling/#adaptive-sampling). | ||
|
||
![Architecture](/img/architecture-v1-2023.png) | ||
|
||
### Via Kafka | ||
|
||
To prevent data loss between collectors and storage, Kafka can be used as an intermediary, persistent queue. An additional component, **jaeger-ingester**, needs to be deployed to read data from Kafka and save to the database. Multiple **jaeger-ingester**s can be deployed to scale up ingestion; they will automatically partition the load across them. | ||
|
||
![Architecture](/img/architecture-v2-2023.png) | ||
|
||
### With OpenTelemetry Collector | ||
|
||
You **do not need to use OpenTelemetry Collector**, because **jaeger-collector** can receive OpenTelemetry data directly from the OpenTelemetry SDKs (using OTLP exporters). However, if you already use the OpenTelemetry Collectors, such as for gathering other types of telemetry or for pre-processing / enriching the tracing data, it __can be placed between__ the SDKs and **jaeger-collector**'s. The OpenTelemetry Collectors can be run as an application sidecar, as a host agent / daemon, or as a central cluster. | ||
|
||
The OpenTelemetry Collector supports Jaeger's Remote Sampling protocol and can either serve static configurations from config files directly, or proxy the requests to the Jaeger backend (e.g., when using adaptive sampling). | ||
|
||
![Architecture](/img/architecture-otel.png) | ||
|
||
#### OpenTelemetry Collector as a sidecar / host agent | ||
|
||
Benefits: | ||
|
||
* The SDK configuration is simplified as both trace export endpoint and sampling config endpoint can point to a local host and not worry about discovering where those services run remotely. | ||
* Collector may provide data enrichment by adding environment information, like k8s pod name. | ||
* Resource usage for data enrichment can be distributed across all application hosts. | ||
|
||
Downsides: | ||
|
||
* An extra layer of marshaling/unmarshaling the data. | ||
|
||
#### OpenTelemetry Collector as a remote cluster | ||
|
||
Benefits: | ||
* Sharding capabilities, e.g., when using [tail-based sampling](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md). | ||
|
||
Downsides: | ||
|
||
* An extra layer of marshaling/unmarshaling the data. | ||
|
||
## Components | ||
|
||
This section details the constituent parts of Jaeger and how they relate to each other. It is arranged by the order in which spans from your application interact with them. | ||
|
||
### Tracing SDKs | ||
|
||
{{< warning >}} | ||
The Jaeger project historically provided a collection of tracing SDKs, called [Jaeger clients](../client-libraries). These libraries have been retired in favor of the [OpenTelemetry SDKs](https://opentelemetry.io). | ||
{{< /warning >}} | ||
|
||
In order to generate tracing data, the applications must be instrumented. An instrumented application creates spans when receiving new requests and attaches context information (trace id, span id, and baggage) to outgoing requests. Only the ids and baggage are propagated with requests; all other profiling data, like operation name, timing, tags and logs, is not propagated. Instead, it is exported out of process to the Jaeger backend asynchronously, in the background. | ||
|
||
![Context propagation explained](/img/context-prop-2023.png) | ||
|
||
There are many ways to instrument an application: | ||
* manually, using the tracing APIs directly, | ||
* relying on instrumentation already created for a variety of existing open source frameworks, | ||
* automatically, via byte code manipulation, monkey-patching, eBPF, and similar techniques. | ||
|
||
Instrumentation typically should not depend on specific tracing SDKs, but only on abstract tracing APIs like the OpenTelemetry API. The tracing SDKs implement the tracing APIs and take care of data export. | ||
|
||
The instrumentation is designed to be always on in production. To minimize overhead, the SDKs employ various sampling strategies. When a trace is sampled, the profiling span data is captured and transmitted to the Jaeger backend. When a trace is not sampled, no profiling data is collected at all, and the calls to the tracing API are short-circuited to incur a minimal amount of overhead. For more information, please refer to the [Sampling](../sampling/) page. | ||
|
||
### Collector | ||
|
||
**jaeger-collector** receives traces, runs them through a processing pipeline for validation and clean-up/enrichment, and stores them in a storage backend. Jaeger comes with built-in support for several storage backends (see [Deployment](../deployment)), as well as extensible plugin framework for implementing custom storage plugins. | ||
|
||
### Query | ||
|
||
**jaeger-query** is a service that exposes the [APIs](../apis) for retrieving traces from storage and hosts a Web UI for searching and analyzing traces. | ||
|
||
### Ingester | ||
|
||
**jaeger-ingester** is a service that reads traces from Kafka and writes them to a storage backend. Effectively, it is a stripped-down version of the Jaeger collector that supports Kafka as the only input protocol. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
title: CLI flags | ||
widescreen: true | ||
hasparent: true | ||
--- | ||
|
||
This is auto-generated documentation for CLI flags supported by Jaeger binaries. | ||
|
||
* CLI flags for some binaries change depending on the `SPAN_STORAGE_TYPE` environment variable. Relevant variations are included below. | ||
* Some binaries support _commands_ (mostly informational), such as `env`, `docs`, `version`, and `status`. These commands are not included here. | ||
* All parameters can be provided via environment variables, by changing all letters to upper-case and replacing all punctuation characters with the underscore `_`. For example, the value for the flag `--cassandra.connections-per-host` can be provided via the `CASSANDRA_CONNECTIONS_PER_HOST` environment variable. | ||
|
||
{{< cli/tools-list >}} |
Oops, something went wrong.