-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Implement sanitizers to operate on OTLP data #5551
Conversation
Signed-off-by: Vamshi Maskuri <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5551 +/- ##
==========================================
- Coverage 96.65% 96.27% -0.38%
==========================================
Files 342 345 +3
Lines 16519 16583 +64
==========================================
Hits 15966 15966
- Misses 363 427 +64
Partials 190 190
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@yurishkuro @albertteoh PTAL. I have rewritten the initial configuration of existing sanitizers to operate on OTLP data format. I have implemented them based on my understanding of OTLP data. Correct me, if I am wrong with understanding of OTLP data |
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
// sanitizeEmptyServiceName sanitizes the service names in the span attributes. | ||
func sanitizeEmptyServiceName(span ptrace.Span) ptrace.Span { | ||
attributes := span.Attributes() | ||
serviceNameAttr, ok := attributes.Get("service.name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
service name is stored in the Resource attributes, not in the span.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro span.Resource
undefined (type ptrace.Span has no field or method Resource)compilerMissingFieldOrMethod
This means we need to access the resource from the ptrace.ResourceSpans
. I think we need to work with ptrace.Traces
instead of individual ptrace.Span
and access the ResourceSpans which contains the Resource and associated ScopeSpans. Then we need to iterate through each ScopeSpans and their Spans. Am I in the right direction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to work with ptrace.Traces
@james-ryans this is an interesting situation. In v1 pipeline all non-span data (like batch.Process) was first denormalized into each span so that the unit of work in the pipeline was a single span. In v2 storage API we want to support batching for better efficiency, but it would also mean that unless storage supports storing normalized data (like storing resource / scope records separately, which is unlikely in k/v stores world), each storage would need to do its own denormalization. It also creates challenges to shared capabilities like sanitizers here, since they need to operate on data uniformly before the storage is called.
We can provide a shared lib to do the flattening / denormalization of resources[]/scopes[]/spans[] into just spans[] (essentially otlp2jaeger does that today already), but I don't think it's a good solution considering that OTLP spans cannot distinguish between resource/scope/span attributes without some magic prefixes / naming. In v1 model we had a clear slot Span.Process to maintain that distinction. I am thinking that in v2 storage implementations their internal data model should also be Span{ Resource=..., Scope=..., rest of span }
, to avoid unnecessary flattening.
Coming back to sanitizers, I think Sanitize(pdata.Traces)
is a good starting point signature. Some sanitizers would only look at Resources in that input, while others might look at everything. The alternative would be to have distinct sanitizers for Resource, Scope, and Span, but I don't think we need this complexity at this point.
@varshith257 does this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this answer your question?
Yes. Thanks for it
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@yurishkuro PTAL and if it's ok, I will move ahead to add tests for them |
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
@yurishkuro PTAL |
cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go
Outdated
Show resolved
Hide resolved
cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go
Outdated
Show resolved
Hide resolved
…zer.go Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: Vamshi Maskuri <[email protected]>
) | ||
|
||
// NewUTF8Sanitizer creates a UTF8 sanitizer. | ||
func NewUTF8Sanitizer() SanitizeTraces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test
type SanitizeTraces func(traces ptrace.Traces) ptrace.Traces | ||
|
||
// NewStandardSanitizers are automatically applied by SpanProcessor. | ||
func NewStandardSanitizers() []SanitizeTraces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to call this from the exporter
cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go
Outdated
Show resolved
Hide resolved
cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go
Outdated
Show resolved
Hide resolved
cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go
Outdated
Show resolved
Hide resolved
cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go
Outdated
Show resolved
Hide resolved
for i := 0; i < resourceSpans.Len(); i++ { | ||
resourceSpan := resourceSpans.At(i) | ||
attributes := resourceSpan.Resource().Attributes() | ||
serviceNameAttr, ok := attributes.Get("service.name") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"service.name"
better to import semantic conventions and use a constant from there
if !utf8.ValidString(k) { | ||
originalKey := k | ||
attributes.PutStr(invalidTagKey, k) | ||
byteSlice := attributes.PutEmptyBytes(badUTF8Prefix + originalKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens to attributes.Range
if you add attributes in the lambda function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying the map while iterating over it can lead to undefined behavior. Better, we should first collect the keys and values that need to be sanitized and then apply these changes after the iteration is complete.
Signed-off-by: Vamshi Maskuri <[email protected]>
…into v2-sanitizer
Co-authored-by: Yuri Shkuro <[email protected]> Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
for i := 0; i < resourceSpans.Len(); i++ { | ||
resourceSpan := resourceSpans.At(i) | ||
|
||
// Sanitize resource attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments like this provide no value
for _, k := range invalidKeys { | ||
originalKey := k | ||
attributes.PutStr(invalidTagKey, k) | ||
binaryAttr := attributes.PutEmptyBytes(badUTF8Prefix + originalKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the originalKey is a bad UTF8 string, you are still creating a new key with the same bad string
|
||
for k, v := range invalidValues { | ||
originalValue := v | ||
attributes.PutStr(k, invalidTagKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this achieve? I suggest you write a specification in English in the comments to the sanitizer what exactly it's trying to do with attributes. And I think starting with the unit tests in this case would be much more useful, because you can see exactly how input and output data are related. For example, I expect the following to be true:
traces1 := inputData // assume it has bad UTF8 strings
traces2 := sanitizeUTF8(traces1)
assert traces1 != traces2 // definitely changed something
traces3 := sanitizeUTF8(traces2)
assert traces2 == traces3 // the 2nd pass is no-op, since no more invalid data is present
Will complete this PR very soon( I had exams, so i hadn't done progress on this) |
Hi @varshith257 If you are not working on this can I take this from here? |
Signed-off-by: Vamshi Maskuri <[email protected]>
fixed in #6078 |
Which problem is this PR solving?
Description of the changes
How was this change tested?
Checklist
jaeger
:make lint test
jaeger-ui
:yarn lint
andyarn test