Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2][query] Create archive reader/writer using regular factory methods #6519

Merged
merged 15 commits into from
Jan 17, 2025

Conversation

mahadzaryab1
Copy link
Collaborator

@mahadzaryab1 mahadzaryab1 commented Jan 10, 2025

Which problem is this PR solving?

Description of the changes

  • This PR changes the jaegerquery extension to remove usages of CreateArchiveSpanReader and CreateArchiveSpanWriter and replace them with their primary storage counterparts from the v2 storage API CreateTraceReader and CreateTraceWriter.
  • 🛑 This PR contains a breaking changes for users of jaeger-v2 that have an ElasticSearch storage configured for traces_archive 🛑
    • The distinction between primary and archive storage was removed which causes some breaking changes that can be remediated. Refer to Step 9 of the test report below for remediation steps.

Test Report

1. Establish Ground Truth on main

To begin, configure the storage and query settings in the all-in-one.yaml file as follows:

jaeger_storage:
  backends:
    some_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-main"
          spans:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          services:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          dependencies:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
          sampling:
            date_layout: "2006-01-02"
            rollover_frequency: "day"
            shards: 5
            replicas: 1
    another_storage:
      elasticsearch:
        indices:
          index_prefix: "jaeger-archive"

2. Spin Up Elasticsearch Using Docker

To bring up Elasticsearch, run the following commands:

jaeger % cd docker-compose/elasticsearch/v8 
v8 % docker compose up

3. Start Jaeger

Start the Jaeger service:

jaeger % go run ./cmd/jaeger

4. Archive a Trace

From the Jaeger UI, select a trace and archive it.

traceID: 0dc3e460bd9b8e0dddfa29a2f751cfb9

Trace Screenshot

5. Update index_prefix

Stop Jaeger and modify the index_prefix in the primary configuration to jaeger-main-1. This ensures the query for the same trace is no longer found in the primary storage but will be found in the archive storage.

6. Query for the Same Trace

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}

7. Test Changes from This PR

Stop Jaeger, checkout this PR, and restart Jaeger:

gh pr checkout 6519
go run ./cmd/jaeger

8. Query for the Same Trace After PR Changes

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "data": null,
  "total": 0,
  "limit": 0,
  "offset": 0,
  "errors": [
    {
      "code": 404,
      "msg": "trace not found"
    }
  ]
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq .
{
  "error": {
    "httpCode": 404,
    "message": "No traces found"
  }
}

🛑 This is where the breaking change occurs 🛑

9. Mitigation for Users

Set use_aliases for Archive Storage

To mitigate the issue, set the use_aliases configuration for your archive storage to true. Update the configuration as follows:

another_storage:
  elasticsearch:
    indices:
      index_prefix: "jaeger-archive"
    use_aliases: true

Add Alias from Old Index to New Index

To ensure backwards compatibility, add an alias from the old index to the new index. You can query the current set of aliases in Elasticsearch with:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

To link the new index (jaeger-archive-jaeger-span-read) with the old index (jaeger-archive-jaeger-span-archive), run the following:

curl -X POST "http://localhost:9200/_aliases" -H 'Content-Type: application/json' -d'
{
  "actions": [
    {
      "add": {
        "index": "jaeger-archive-jaeger-span-archive",
        "alias": "jaeger-archive-jaeger-span-read"
      }
    }
  ]
}'

Confirm that the alias has been added:

curl -X GET "http://localhost:9200/_aliases?pretty"
{
  "jaeger-main-1-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-archive-jaeger-span-archive": { "aliases": { "jaeger-archive-jaeger-span-read": {} } },
  "jaeger-main-2-jaeger-span-2025-01-16": { "aliases": {} },
  "jaeger-main-2-jaeger-service-2025-01-16": { "aliases": {} },
  "jaeger-main-1-jaeger-service-2025-01-16": { "aliases": {} }
}

Note that if you already had use_aliases set to true for your archive storage before the upgrade, then Jaeger would've been using an index name with -read suffix: jaeger-archive-jaeger-span-archive-read. Then for the example above, you would create an alias jaeger-archive-jaeger-span-read pointing to jaeger-archive-jaeger-span-archive-read.

10. Restart Jaeger and Retry the Query

Finally, restart Jaeger and run the same trace queries again:

curl -s http://localhost:16686/api/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.data[].spans[] | {traceID, operationName}'
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "/api/services"
}
{
  "traceID": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "operationName": "GetService"
}
curl -s http://localhost:16686/api/v3/traces/0dc3e460bd9b8e0dddfa29a2f751cfb9 | jq '.result.resourceSpans[].scopeSpans[].spans[] | {traceId, name}'
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "/api/services"
}
{
  "traceId": "0dc3e460bd9b8e0dddfa29a2f751cfb9",
  "name": "GetService"
}

Checklist

@mahadzaryab1
Copy link
Collaborator Author

mahadzaryab1 commented Jan 10, 2025

@yurishkuro should this be marked as a breaking change? the es storage wont use the -archive alias anymore for jaeger-v2

Copy link

codecov bot commented Jan 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.21%. Comparing base (463f5f3) to head (e0cdb4a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6519      +/-   ##
==========================================
- Coverage   96.23%   96.21%   -0.02%     
==========================================
  Files         373      373              
  Lines       21378    21400      +22     
==========================================
+ Hits        20574    20591      +17     
- Misses        612      616       +4     
- Partials      192      193       +1     
Flag Coverage Δ
badger_v1 10.64% <0.00%> (ø)
badger_v2 2.77% <0.00%> (ø)
cassandra-4.x-v1-manual 16.54% <0.00%> (ø)
cassandra-4.x-v2-auto 2.71% <0.00%> (ø)
cassandra-4.x-v2-manual 2.71% <0.00%> (-0.03%) ⬇️
cassandra-5.x-v1-manual 16.54% <0.00%> (ø)
cassandra-5.x-v2-auto 2.71% <0.00%> (ø)
cassandra-5.x-v2-manual 2.71% <0.00%> (ø)
elasticsearch-6.x-v1 20.31% <0.00%> (-0.01%) ⬇️
elasticsearch-7.x-v1 20.39% <0.00%> (ø)
elasticsearch-8.x-v1 20.56% <0.00%> (+<0.01%) ⬆️
elasticsearch-8.x-v2 2.77% <0.00%> (+<0.01%) ⬆️
grpc_v1 12.15% <0.00%> (-0.01%) ⬇️
grpc_v2 9.03% <0.00%> (ø)
kafka-3.x-v1 10.33% <0.00%> (ø)
kafka-3.x-v2 2.77% <0.00%> (ø)
memory_v2 2.77% <0.00%> (+<0.01%) ⬆️
opensearch-1.x-v1 20.44% <0.00%> (+<0.01%) ⬆️
opensearch-2.x-v1 20.44% <0.00%> (ø)
opensearch-2.x-v2 2.77% <0.00%> (+<0.01%) ⬆️
tailsampling-processor 0.51% <0.00%> (ø)
unittests 95.10% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Mahad Zaryab <[email protected]>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a breaking change implied here, in the sense that if the users do not set the config options appropriately the new storage objects may not behave the same way as archive objects did previously. I recommend handling it with a feature gate from OTEL. We can declare the feature with Beta state (i.e. enabled right away, mark as breaking change), but the users have the option to turn it off manually and fall back onto the old behavior that we should preserve for now. Then in the next release we set the feature to Stable where turning it off will give a runtime error (could also be labeled a breaking change) so the new behavior is the only one possible. And then in the following release we remove the feature altogether along with the legacy code.

example in cmd/jaeger/internal/extension/remotesampling/config.go

yurishkuro
yurishkuro previously approved these changes Jan 15, 2025
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@yurishkuro yurishkuro changed the title [WIP][v2][query] Change jaegerquery extension to use primary reader/writer for archive storage [WIP][v2][query] Create archive reader/writer using regular factory methods Jan 15, 2025
@yurishkuro
Copy link
Member

I have a feeling we will still need to address the question of removing create-archive methods later because as we start LFX project and being implementing v2 API in the storage directly, we will run into issue with Elasticsearch implementation.

@yurishkuro yurishkuro dismissed their stale review January 15, 2025 02:47

We need to implement a feature gate

@yurishkuro
Copy link
Member

I think this is breaking backwards compatibility, is it not? I.e. to get the equivalent behavior as the previous version the user needs to set some parameters (which ones, what combination?)

@mahadzaryab1 mahadzaryab1 added changelog:breaking-change Change that is breaking public APIs or established behavior and removed changelog:bugfix-or-minor-feature labels Jan 16, 2025
mahadzaryab1 added a commit that referenced this pull request Jan 17, 2025
…6555)

## Which problem is this PR solving?
- Towards #6065 

## Description of the changes
- This PR implements a reverse adapter (SpanWriter) that wraps a native
v2 storage writer (tracestore.Writer) and downgrades it to implement the
v1 writer (spanstore.Writer).
- This will be used by #6519
to downgrade a native v2 writer so that it can be used by the v1 query
service

## How was this change tested?
- Added unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `npm run lint` and `npm run test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1 mahadzaryab1 changed the title [WIP][v2][query] Create archive reader/writer using regular factory methods [v2][query] Create archive reader/writer using regular factory methods Jan 17, 2025
Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1 mahadzaryab1 marked this pull request as ready for review January 17, 2025 01:10
Signed-off-by: Mahad Zaryab <[email protected]>
@@ -140,26 +142,58 @@ func (s *server) addArchiveStorage(
return nil
}

f, err := jaegerstorage.GetStorageFactory(s.config.Storage.TracesArchive, host)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this method anywhere? If yes, can that code be upgraded as well? If no, I would remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok let's book child tickets to address that

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the test report into the description, so that it's more discoverable

@mahadzaryab1 mahadzaryab1 enabled auto-merge (squash) January 17, 2025 15:33
@mahadzaryab1 mahadzaryab1 merged commit 7f16f49 into jaegertracing:main Jan 17, 2025
55 checks passed
@mahadzaryab1 mahadzaryab1 deleted the query-archive branch January 17, 2025 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/storage changelog:breaking-change Change that is breaking public APIs or established behavior v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants