Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Implement sanitizers to operate on OTLP data #5551

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 14 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions cmd/jaeger/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,9 @@

This is experimental Jaeger V2 based on OpenTelemetry collector.
See https://github.com/jaegertracing/jaeger/issues/4843.

## Compatibility

### Service Name Sanitizer

In v1, there was a `serviceNameSanitizer` that sanitized the service names in span annotations using a source of truth alias to service cache. This functionality has been removed in v2. If your implementation relies on this sanitizer, you will need to find a different way to integrate this functionality, such as implementing a custom processor.
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"go.opentelemetry.io/collector/pdata/ptrace"
)

// Constants for the replacement names
const (
serviceNameReplacement = "empty-service-name"
varshith257 marked this conversation as resolved.
Show resolved Hide resolved
nullProcessServiceName = "null-process-and-service-name"
varshith257 marked this conversation as resolved.
Show resolved Hide resolved
)

// NewEmptyServiceNameSanitizer returns a function that replaces empty service names
// with a predefined string.
func NewEmptyServiceNameSanitizer() SanitizeTraces {
return sanitizeEmptyServiceName

Check warning on line 19 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L18-L19

Added lines #L18 - L19 were not covered by tests
}

// sanitizeEmptyServiceName sanitizes the service names in the resource attributes.
func sanitizeEmptyServiceName(traces ptrace.Traces) ptrace.Traces {
resourceSpans := traces.ResourceSpans()
for i := 0; i < resourceSpans.Len(); i++ {
resourceSpan := resourceSpans.At(i)
attributes := resourceSpan.Resource().Attributes()
serviceNameAttr, ok := attributes.Get("service.name")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"service.name"

better to import semantic conventions and use a constant from there

if !ok {

Check warning on line 29 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L23-L29

Added lines #L23 - L29 were not covered by tests
// If service.name is missing, set it to nullProcessServiceName
attributes.PutStr("service.name", nullProcessServiceName)
} else if serviceNameAttr.Str() == "" {

Check warning on line 32 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L31-L32

Added lines #L31 - L32 were not covered by tests
// If service.name is empty, replace it with serviceNameReplacement
attributes.PutStr("service.name", serviceNameReplacement)

Check warning on line 34 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L34

Added line #L34 was not covered by tests
}
}
return traces

Check warning on line 37 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/empty_service_name_sanitizer.go#L37

Added line #L37 was not covered by tests
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"testing"

"github.com/jaegertracing/jaeger/pkg/testutils"
)

func TestMain(m *testing.M) {
testutils.VerifyGoLeaks(m)
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"go.opentelemetry.io/collector/pdata/ptrace"
)

// SanitizeTraces is a function that performs enrichment, clean-up, or normalization of trace data.
type SanitizeTraces func(traces ptrace.Traces) ptrace.Traces

// NewStandardSanitizers are automatically applied by SpanProcessor.
func NewStandardSanitizers() []SanitizeTraces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to call this from the exporter

return []SanitizeTraces{
NewEmptyServiceNameSanitizer(),
NewUTF8Sanitizer(),

Check warning on line 17 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L14-L17

Added lines #L14 - L17 were not covered by tests
}
}

// NewChainedSanitizer creates a Sanitizer from the variadic list of passed Sanitizers.
// If the list only has one element, it is returned directly to minimize indirection.
func NewChainedSanitizer(sanitizers ...SanitizeTraces) SanitizeTraces {
if len(sanitizers) == 1 {
return sanitizers[0]

Check warning on line 25 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L23-L25

Added lines #L23 - L25 were not covered by tests
}
return func(traces ptrace.Traces) ptrace.Traces {
for _, s := range sanitizers {
traces = s(traces)

Check warning on line 29 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L27-L29

Added lines #L27 - L29 were not covered by tests
}
return traces

Check warning on line 31 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/sanitizer.go#L31

Added line #L31 was not covered by tests
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Copyright (c) 2024 The Jaeger Authors.
// SPDX-License-Identifier: Apache-2.0

package sanitizer

import (
"unicode/utf8"

"go.opentelemetry.io/collector/pdata/pcommon"
"go.opentelemetry.io/collector/pdata/ptrace"
)

const (
invalidOperation = "InvalidOperationName"
invalidService = "InvalidServiceName"
invalidTagKey = "InvalidTagKey"
badUTF8Prefix = "bad_utf8_"
)

// NewUTF8Sanitizer creates a UTF8 sanitizer.
func NewUTF8Sanitizer() SanitizeTraces {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test

return sanitizeUF8

Check warning on line 22 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L21-L22

Added lines #L21 - L22 were not covered by tests
}

// sanitizeUTF8 sanitizes the UTF8 in the spans.
func sanitizeUF8(traces ptrace.Traces) ptrace.Traces {
resourceSpans := traces.ResourceSpans()
for i := 0; i < resourceSpans.Len(); i++ {
resourceSpan := resourceSpans.At(i)
scopeSpans := resourceSpan.ScopeSpans()
for j := 0; j < scopeSpans.Len(); j++ {
spans := scopeSpans.At(j).Spans()
for k := 0; k < spans.Len(); k++ {
span := spans.At(k)

Check warning on line 34 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L26-L34

Added lines #L26 - L34 were not covered by tests
// Sanitize operation name
if !utf8.ValidString(span.Name()) {
originalName := span.Name()
span.SetName(invalidOperation)
byteSlice := span.Attributes().PutEmptyBytes(badUTF8Prefix + "operation_name")
varshith257 marked this conversation as resolved.
Show resolved Hide resolved
byteSlice.FromRaw([]byte(originalName))

Check warning on line 40 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L36-L40

Added lines #L36 - L40 were not covered by tests
}

// Sanitize service name attribute
attributes := span.Attributes()
serviceNameAttr, ok := attributes.Get("service.name")
varshith257 marked this conversation as resolved.
Show resolved Hide resolved
if ok && !utf8.ValidString(serviceNameAttr.Str()) {
originalServiceName := serviceNameAttr.Str()
attributes.PutStr("service.name", invalidService)
byteSlice := attributes.PutEmptyBytes(badUTF8Prefix + "service.name")
byteSlice.FromRaw([]byte(originalServiceName))

Check warning on line 50 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L44-L50

Added lines #L44 - L50 were not covered by tests
}

sanitizeAttributes(attributes)

Check warning on line 53 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L53

Added line #L53 was not covered by tests
}
}
}
return traces

Check warning on line 57 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L57

Added line #L57 was not covered by tests
}

// sanitizeAttributes sanitizes attributes to ensure UTF8 validity.
func sanitizeAttributes(attributes pcommon.Map) {
attributes.Range(func(k string, v pcommon.Value) bool {

Check warning on line 62 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L61-L62

Added lines #L61 - L62 were not covered by tests
// Handle invalid UTF8 in attribute keys
if !utf8.ValidString(k) {
originalKey := k
attributes.PutStr(invalidTagKey, k)
byteSlice := attributes.PutEmptyBytes(badUTF8Prefix + originalKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens to attributes.Range if you add attributes in the lambda function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying the map while iterating over it can lead to undefined behavior. Better, we should first collect the keys and values that need to be sanitized and then apply these changes after the iteration is complete.

byteSlice.FromRaw([]byte(originalKey))
} else if v.Type() == pcommon.ValueTypeStr && !utf8.ValidString(v.Str()) {

Check warning on line 69 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L64-L69

Added lines #L64 - L69 were not covered by tests
varshith257 marked this conversation as resolved.
Show resolved Hide resolved
// Handle invalid UTF8 in attribute values
originalValue := v.Str()
attributes.PutStr(k, invalidTagKey)
byteSlice := attributes.PutEmptyBytes(badUTF8Prefix + k)
byteSlice.FromRaw([]byte(originalValue))

Check warning on line 74 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L71-L74

Added lines #L71 - L74 were not covered by tests
}
return true

Check warning on line 76 in cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go

View check run for this annotation

Codecov / codecov/patch

cmd/jaeger/internal/exporters/storageexporter/sanitizer/utf8_sanitizer.go#L76

Added line #L76 was not covered by tests
})
}
Loading