Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure OTEL Collector to observe Internal Telemetry #5752

Merged
merged 38 commits into from
Jul 25, 2024

Conversation

Wise-Wizard
Copy link
Contributor

@Wise-Wizard Wise-Wizard commented Jul 16, 2024

Which problem is this PR solving?

This PR addresses a part of the issue #5633

Description of the changes
This is a Draft PR to achieve Observability Parity between V1 and V2 components by configuring OTEL Collector config files to initialise internal tracer and metrics
How was this change tested?

The changes were tested by running the following command:

make test
CI actions

Checklist

  • I have read CONTRIBUTING_GUIDELINES.md
  • I have signed all commits
  • I have added unit tests for the new functionality
  • I have run lint and test steps successfully
    • for jaeger: make lint test
    • for jaeger-ui: yarn lint and yarn test

Signed-off-by: Wise-Wizard <[email protected]>
@Wise-Wizard Wise-Wizard requested a review from a team as a code owner July 16, 2024 16:19
@Wise-Wizard Wise-Wizard requested a review from albertteoh July 16, 2024 16:19
@yurishkuro yurishkuro added the changelog:exprimental Change to an experimental part of the code label Jul 16, 2024
@yurishkuro
Copy link
Member

The changes were tested by running the following command:
make test

How does doing that demonstrate that the changes result in desired behavior?

Copy link

codecov bot commented Jul 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.66%. Comparing base (e437971) to head (9a3536d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5752   +/-   ##
=======================================
  Coverage   96.66%   96.66%           
=======================================
  Files         342      342           
  Lines       16515    16519    +4     
=======================================
+ Hits        15964    15968    +4     
  Misses        362      362           
  Partials      189      189           
Flag Coverage Δ
badger_v1 8.05% <ø> (ø)
badger_v2 1.81% <ø> (ø)
cassandra-3.x-v1 16.61% <ø> (ø)
cassandra-3.x-v2 1.74% <ø> (ø)
cassandra-4.x-v1 16.61% <ø> (ø)
cassandra-4.x-v2 1.74% <ø> (ø)
elasticsearch-6.x-v1 18.78% <ø> (ø)
elasticsearch-7.x-v1 18.84% <ø> (ø)
elasticsearch-8.x-v1 19.02% <ø> (ø)
elasticsearch-8.x-v2 1.80% <ø> (ø)
grpc_v1 9.52% <ø> (+0.01%) ⬆️
grpc_v2 7.14% <ø> (+0.01%) ⬆️
kafka 9.74% <ø> (ø)
memory_v2 1.81% <ø> (ø)
opensearch-1.x-v1 18.89% <ø> (+0.01%) ⬆️
opensearch-2.x-v1 18.89% <ø> (ø)
opensearch-2.x-v2 1.80% <ø> (-0.02%) ⬇️
unittests 95.09% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Wise-Wizard
Copy link
Contributor Author

The changes were tested by running the following command:
make test

How does doing that demonstrate that the changes result in desired behavior?

It dosen't, just created a PR to check for github actions

Signed-off-by: Wise-Wizard <[email protected]>
Signed-off-by: Wise-Wizard <[email protected]>
@Wise-Wizard
Copy link
Contributor Author

Wise-Wizard commented Jul 16, 2024

@yurishkuro I think I have configured the tracer successfully as now when I am passing telset.TracerProvider instead of jTracer in Jaeger Query, the tests aren't failing anymore.

@Wise-Wizard Wise-Wizard changed the title Test configuration changes Configure OTEL Collector to observe Internal Telemetry Jul 16, 2024
@yurishkuro
Copy link
Member

Can you provide evidence that the changes are working? Just passing CI only means it didn't break things, but we don't explicitly test for internally generated traces (we should , it would be a good e2e test, e.g. as part of all-in-one e2e we could check that the internal traces are generated for API queries)

@Wise-Wizard
Copy link
Contributor Author

Can you provide evidence that the changes are working? Just passing CI only means it didn't break things, but we don't explicitly test for internally generated traces (we should , it would be a good e2e test, e.g. as part of all-in-one e2e we could check that the internal traces are generated for API queries)

When I had configured telset with OTEL.TracerProvider instead of JTracer without any changes in configuration of yaml file the all-in-one V2 test getAPITrace fails because I think a no-op tracer is being provided but after the changes in the configuration file the test seems to be passing.
Apart from that I have no POC

@Wise-Wizard
Copy link
Contributor Author

@Wise-Wizard a screenshot is good as anecdotal evidence, but we need to be more rigorous than that. Jaeger has a lot of different metrics, we cannot validate parity by eyeballing screenshots, we need a scripted diffing solution.

Yes, the screenshot was just a small evidence.
i have compared the metrics names emitted by Jaeger Query as mentioned in the Doc
All of them are emitted in V2 scope as well.

@yurishkuro
Copy link
Member

i have compared the metrics names emitted by Jaeger Query ... All of them are emitted in V2 scope as well.

How did you compare them all?

@Wise-Wizard
Copy link
Contributor Author

i have compared the metrics names emitted by Jaeger Query ... All of them are emitted in V2 scope as well.

How did you compare them all?

I just compared the names of the metrics side by side.
The difference is in the number of operations per metric name.

Signed-off-by: Wise-Wizard <[email protected]>
@yurishkuro
Copy link
Member

I just compared the names of the metrics side by side.

Can we use something like this instead? https://github.com/prometheus/prom2json
I would like to have scripted diffs, not just eyeballing verification.

@Wise-Wizard
Copy link
Contributor Author

Wise-Wizard commented Jul 24, 2024

I just compared the names of the metrics side by side.

Can we use something like this instead? https://github.com/prometheus/prom2json I would like to have scripted diffs, not just eyeballing verification.

Hi, I have created a tmp folder and added JSON file for each using prom2Json.
PTAL at it and lmk if I need to filter out and make changes in the JSON file.
After that I can delete the folder and store the JSON locally to use it later for the migration guide.

@yurishkuro
Copy link
Member

We don't need to see the actual json files, but their diffs. Any thoughts on how to compare them? We don't care about the values, only names and labels. There are probably json diffing tools.

@Wise-Wizard
Copy link
Contributor Author

Wise-Wizard commented Jul 24, 2024

We don't need to see the actual json files, but their diffs. Any thoughts on how to compare them? We don't care about the values, only names and labels. There are probably json diffing tools.

Ok, will research about that and look for some tool to see the differences.
By differences you also mean the jaeger_collector and jaeger_agent metrics names to be included right?

Signed-off-by: Wise-Wizard <[email protected]>
@Wise-Wizard
Copy link
Contributor Author

Hi, so I wrote a Python script to read the difference in metrics names in both the files and written the differences in names in difference.json file for comparison.
Please do take a look at the file.

@Wise-Wizard Wise-Wizard requested a review from yurishkuro July 24, 2024 12:37
tmp/differences.json Outdated Show resolved Hide resolved
tmp/differences.json Outdated Show resolved Hide resolved
Signed-off-by: Wise-Wizard <[email protected]>
Signed-off-by: Wise-Wizard <[email protected]>
tmp/differences.json Outdated Show resolved Hide resolved
tmp/differences.json Outdated Show resolved Hide resolved
Signed-off-by: Wise-Wizard <[email protected]>
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to merge this PR. The following changes are needed:

  • commit your metrics comparison script
  • remove differences.json
  • fix unit test

tmp/differences.json Outdated Show resolved Hide resolved
@Wise-Wizard
Copy link
Contributor Author

Will do the changes

Wise-Wizard and others added 2 commits July 25, 2024 08:51
tmp/compare_metrics.py Outdated Show resolved Hide resolved
@yurishkuro
Copy link
Member

make sure to get the CI green, you have unit test failures, and I see new linter failures

Signed-off-by: Wise-Wizard <[email protected]>
Signed-off-by: Wise-Wizard <[email protected]>
@Wise-Wizard
Copy link
Contributor Author

make sure to get the CI green, you have unit test failures, and I see new linter failures

Done.

@yurishkuro yurishkuro merged commit 0f687f4 into jaegertracing:main Jul 25, 2024
44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/otel changelog:exprimental Change to an experimental part of the code v2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants