Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dlsym issue (#6048 => hotfix) #6049

Merged
merged 1 commit into from
Sep 18, 2024
Merged

Conversation

andrewlock
Copy link
Member

Summary of changes

This PR addresses the issue
#6045

Reason for change

When using the dlsym function, the compiler adds in the import symbols table that we need the dlsym symbol.
Before being a universal binary (same binary used for glibc-based linux and musl-libc-based linux) and the compiler added in a DT_NEEDED section the library libdl.so (the library containing dlsym). When the wrapper is loaded, it will look through all the DT_NEEDED sections to find a library that contains the dlsym symbol. Since being a universal binary, the DT_NEEDED sections are removed (part of being universal) and we have to resolve by hand needed symbols (dlsym, pthread_once ..).
If we use dlsym (or other symbol), we will hit this issue.

Implementation details

  • use __dd_dlsym instead

Test coverage

Added a snapshot test using nm that verifies that the undefined symbols in the universal binary haven't changed. It's equivalent to running

nm -D Datadog.Linux.ApiWrapper.x64.so | grep ' U ' | awk '{print $2}' | sed 's/@.*//' | sort

but done using Nuke instead. It would probably make sense for this to be a "normal" test in the native tests, but given it has a dependency on nm, which is definitely available in the universal build dockerfile it was quicker and easier to get this up and running directly. When it fails, it prints the diff and throws an exception, e.g.

System.Exception: Found differences in undefined symbols (dlsym) in the Native Wrapper library. Verify that these changes are expected, and will not cause problems. Removing symbols is generally a safe operation, but adding them could cause crashes. If the new symbols are safe to add, update the snapshot file at C:\repos\dd-trace-dotnet\tracer\test\snapshots\native-wrapper-symbols-x64.verified.txt with the new values

Other details

This is a hotfix for

## Summary of changes

This PR addresses the issue
#6045

## Reason for change

When using the `dlsym` function, the compiler adds in the import symbols
table that we need the `dlsym` symbol.
Before being a universal binary (same binary used for glibc-based linux
and musl-libc-based linux) and the compiler added in a `DT_NEEDED`
section the library `libdl.so` (the library containing `dlsym`). When
the wrapper is loaded, it will look through all the `DT_NEEDED` sections
to find a library that contains the `dlsym` symbol.
Since being a universal binary, the `DT_NEEDED` sections are removed
(part of being universal) and we have to resolve by hand needed symbols
(`dlsym`, `pthread_once` ..).
If we use `dlsym` (or other symbol), we will hit this issue.

## Implementation details

- use `__dd_dlsym` instead

## Test coverage

Added a snapshot test using `nm` that verifies that the undefined
symbols in the universal binary haven't changed. It's equivalent to
running

```bash
nm -D Datadog.Linux.ApiWrapper.x64.so | grep ' U ' | awk '{print $2}' | sed 's/@.*//' | sort
```

but done using Nuke instead. It would probably make sense for this to be
a "normal" test in the native tests, but given it has a dependency on
`nm`, which is _definitely_ available in the universal build dockerfile
it was quicker and easier to get this up and running directly. When it
fails, it prints the diff and throws an exception, e.g.

```bash
System.Exception: Found differences in undefined symbols (dlsym) in the Native Wrapper library. Verify that these changes are expected, and will not cause problems. Removing symbols is generally a safe operation, but adding them could cause crashes. If the new symbols are safe to add, update the snapshot file at C:\repos\dd-trace-dotnet\tracer\test\snapshots\native-wrapper-symbols-x64.verified.txt with the new values
```

## Other details

This will be hotfixed onto 3.3.1 and 2.59.1

---------

Co-authored-by: Andrew Lock <[email protected]>
@andrewlock andrewlock added type:bug area:profiler Issues related to the continous-profiler labels Sep 18, 2024
@andrewlock andrewlock requested review from a team as code owners September 18, 2024 10:44
Copy link
Contributor

@bouwkast bouwkast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks!

@datadog-ddstaging
Copy link

datadog-ddstaging bot commented Sep 18, 2024

Datadog Report

Branch report: andrew/dlsym-hotfix-v3
Commit report: 4fbd8a5
Test service: dd-trace-dotnet

✅ 0 Failed, 368261 Passed, 2368 Skipped, 16h 33m 50.74s Total Time
⌛ 1 Performance Regression

⌛ Performance Regressions vs Default Branch (1)

  • StartStopWithChild - Benchmarks.Trace.ActivityBenchmark 16.92µs (+929ns, +6%) - Details

@andrewlock
Copy link
Member Author

Execution-Time Benchmarks Report ⏱️

Execution-time results for samples comparing the following branches/commits:

Execution-time benchmarks measure the whole time it takes to execute a program. And are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are shown in red. The following thresholds were used for comparing the execution times:

  • Welch test with statistical test for significance of 5%
  • Only results indicating a difference greater than 5% and 5 ms are considered.

Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard.

Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph).

gantt
    title Execution time (ms) FakeDbCommand (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (70ms)  : 66, 74
     .   : milestone, 70,
    master - mean (70ms)  : 66, 74
     .   : milestone, 70,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (1,110ms)  : 1081, 1139
     .   : milestone, 1110,
    master - mean (1,119ms)  : 1092, 1145
     .   : milestone, 1119,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (109ms)  : 105, 112
     .   : milestone, 109,
    master - mean (108ms)  : 105, 112
     .   : milestone, 108,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (798ms)  : 776, 821
     .   : milestone, 798,
    master - mean (809ms)  : 792, 827
     .   : milestone, 809,

Loading
gantt
    title Execution time (ms) FakeDbCommand (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (92ms)  : 89, 96
     .   : milestone, 92,
    master - mean (91ms)  : 89, 94
     .   : milestone, 91,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (752ms)  : 725, 779
     .   : milestone, 752,
    master - mean (761ms)  : 739, 783
     .   : milestone, 761,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Framework 4.6.2) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (192ms)  : 188, 195
     .   : milestone, 192,
    master - mean (192ms)  : 186, 198
     .   : milestone, 192,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (1,199ms)  : 1174, 1224
     .   : milestone, 1199,
    master - mean (1,201ms)  : 1175, 1228
     .   : milestone, 1201,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET Core 3.1) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (277ms)  : 274, 281
     .   : milestone, 277,
    master - mean (278ms)  : 273, 282
     .   : milestone, 278,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (966ms)  : 942, 990
     .   : milestone, 966,
    master - mean (968ms)  : 947, 988
     .   : milestone, 968,

Loading
gantt
    title Execution time (ms) HttpMessageHandler (.NET 6) 
    dateFormat  X
    axisFormat %s
    todayMarker off
    section Baseline
    This PR (6049) - mean (266ms)  : 262, 270
     .   : milestone, 266,
    master - mean (266ms)  : 261, 271
     .   : milestone, 266,

    section CallTarget+Inlining+NGEN
    This PR (6049) - mean (946ms)  : 922, 970
     .   : milestone, 946,
    master - mean (945ms)  : 917, 973
     .   : milestone, 945,

Loading

Copy link
Contributor

@chrisnas chrisnas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andrewlock andrewlock merged commit 69932b3 into hotfix/3.3.1 Sep 18, 2024
76 of 80 checks passed
@andrewlock andrewlock deleted the andrew/dlsym-hotfix-v3 branch September 18, 2024 13:04
andrewlock added a commit that referenced this pull request Sep 18, 2024
## Summary of changes

This PR addresses the issue
#6045

## Reason for change

When using the `dlsym` function, the compiler adds in the import symbols
table that we need the `dlsym` symbol.
Before being a universal binary (same binary used for glibc-based linux
and musl-libc-based linux) and the compiler added in a `DT_NEEDED`
section the library `libdl.so` (the library containing `dlsym`). When
the wrapper is loaded, it will look through all the `DT_NEEDED` sections
to find a library that contains the `dlsym` symbol. Since being a
universal binary, the `DT_NEEDED` sections are removed (part of being
universal) and we have to resolve by hand needed symbols (`dlsym`,
`pthread_once` ..).
If we use `dlsym` (or other symbol), we will hit this issue.

## Implementation details

- use `__dd_dlsym` instead

## Test coverage

Added a snapshot test using `nm` that verifies that the undefined
symbols in the universal binary haven't changed. It's equivalent to
running

```bash
nm -D Datadog.Linux.ApiWrapper.x64.so | grep ' U ' | awk '{print $2}' | sed 's/@.*//' | sort
```

but done using Nuke instead. It would probably make sense for this to be
a "normal" test in the native tests, but given it has a dependency on
`nm`, which is _definitely_ available in the universal build dockerfile
it was quicker and easier to get this up and running directly. When it
fails, it prints the diff and throws an exception, e.g.

```bash
System.Exception: Found differences in undefined symbols (dlsym) in the Native Wrapper library. Verify that these changes are expected, and will not cause problems. Removing symbols is generally a safe operation, but adding them could cause crashes. If the new symbols are safe to add, update the snapshot file at C:\repos\dd-trace-dotnet\tracer\test\snapshots\native-wrapper-symbols-x64.verified.txt with the new values
```

## Other details

This is a hotfix for 
- #6048

Co-authored-by: Gregory LEOCADIE <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:profiler Issues related to the continous-profiler type:bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants