Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transform v1 sanitizers into processors or something else #5545

Closed
yurishkuro opened this issue Jun 7, 2024 · 10 comments · Fixed by #6078
Closed

Transform v1 sanitizers into processors or something else #5545

yurishkuro opened this issue Jun 7, 2024 · 10 comments · Fixed by #6078
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement v2

Comments

@yurishkuro
Copy link
Member

yurishkuro commented Jun 7, 2024

In the v1 collector all received spans are passed through the sanitizers framework cmd/collector/app/sanitizer/. This is currently missing in jaeger-v2. Since the jaegerexporter in v2 operates on OTLP data, the existing sanitizers won't work directly, and will need to be rewritten to operate on OTLP format directly.

There are two options how the sanitizers can be invoked. One is to invoke them automatically in the exporter. Another is by defining them as span processors in the OTEL collector pipeline. This allows users to cherry-pick which sanitizers they want to use, but also moves the burden of configuration onto the user, whereas in collector-v1 the default set of sanitizers was always used automatically. I would suggest invoking them automatically in the exporter - in the future if configurability is desirable it can be implemented.

The main requirements are

  1. bad data should be sanitized so that it's now "good", e.g. no malformed utf8 strings present anywhere, since they can break transmission
  2. the original bad data should still be preserved in some encoded (safe) form for debugging purposes.
@yurishkuro yurishkuro converted this from a draft issue Jun 7, 2024
@yurishkuro yurishkuro added help wanted Features that maintainers are willing to accept but do not have cycles to implement good first issue Good for beginners v2 labels Jun 7, 2024
@varshith257
Copy link
Contributor

Automatically invoking the sanitizers in the exporter seems like a good approach, ensuring consistent data sanitization without burdening users with additional configuration. This approach maintains simplicity while addressing the immediate need for sanitization in the v2 collector.

@arujjval
Copy link

arujjval commented Jun 9, 2024

@yurishkuro So for now, we have to work on rewriting existing sanitizers for operating on OTLP right?

@yurishkuro
Copy link
Member Author

Yes.

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 11, 2024

@yurishkuro what's the status of this? is there anything I can help with?

@yurishkuro
Copy link
Member Author

Yes , I think this stalled

@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 11, 2024

@yurishkuro what can I help with? are there subtasks for this issue?

@yurishkuro
Copy link
Member Author

@mahadzaryab1 see comments in #5551 - it's a good direction, but needs completion.

yurishkuro pushed a commit that referenced this issue Oct 12, 2024
## Which problem is this PR solving?
- Towards #5545

## Description of the changes
- Created a new `Sanitizer` type for sanitizing traces in OTLP format 
- Implemented the empty service name sanitizer; I'll open a separate PR
for the UTF-8 sanitizer

## How was this change tested?
- Unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
@mahadzaryab1
Copy link
Collaborator

mahadzaryab1 commented Oct 12, 2024

@yurishkuro do you know what the reasoning behind https://github.com/varshith257/jaeger/blob/v2-sanitizer/cmd/collector/app/sanitizer/utf8_sanitizer.go#L76-L78 is? if the value isn't a valid utf-8 string why is it added to the map? isn't this just a no-op?

@mahadzaryab1
Copy link
Collaborator

@yurishkuro do you know what the reasoning behind https://github.com/varshith257/jaeger/blob/v2-sanitizer/cmd/collector/app/sanitizer/utf8_sanitizer.go#L76-L78 is? if the value isn't a valid utf-8 string why is it added to the map? isn't this just a no-op?

Oh I see. They get converted to bytes. Do we want the same behaviour for the v2 sanitizer ?

@yurishkuro
Copy link
Member Author

Yes. The motivation is to make the data well formed, but also to preserve original invalid data for debugging.

Saumya40-codes pushed a commit to Saumya40-codes/jaeger that referenced this issue Oct 14, 2024
…acing#6077)

## Which problem is this PR solving?
- Towards jaegertracing#5545

## Description of the changes
- Created a new `Sanitizer` type for sanitizing traces in OTLP format 
- Implemented the empty service name sanitizer; I'll open a separate PR
for the UTF-8 sanitizer

## How was this change tested?
- Unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Saumya Shah <[email protected]>
Saumya40-codes pushed a commit to Saumya40-codes/jaeger that referenced this issue Oct 14, 2024
…acing#6077)

## Which problem is this PR solving?
- Towards jaegertracing#5545

## Description of the changes
- Created a new `Sanitizer` type for sanitizing traces in OTLP format
- Implemented the empty service name sanitizer; I'll open a separate PR
for the UTF-8 sanitizer

## How was this change tested?
- Unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
Signed-off-by: Saumya Shah <[email protected]>
@github-project-automation github-project-automation bot moved this from Todo to Done in Jaeger V2 Oct 14, 2024
chahatsagarmain pushed a commit to chahatsagarmain/jaeger that referenced this issue Oct 23, 2024
…acing#6077)

## Which problem is this PR solving?
- Towards jaegertracing#5545

## Description of the changes
- Created a new `Sanitizer` type for sanitizing traces in OTLP format 
- Implemented the empty service name sanitizer; I'll open a separate PR
for the UTF-8 sanitizer

## How was this change tested?
- Unit tests

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
chahatsagarmain pushed a commit to chahatsagarmain/jaeger that referenced this issue Oct 23, 2024
## Which problem is this PR solving?
- Resolves jaegertracing#5545 

## Description of the changes
- Continuation of jaegertracing#6077 to
implement the UTF-8 sanitizer to operate on OTLP data.

## How was this change tested?
- Unit tests 

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [x] I have added unit tests for the new functionality
- [x] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Mahad Zaryab <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for beginners help wanted Features that maintainers are willing to accept but do not have cycles to implement v2
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants