Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update feature/perf with latest blocker fixes #6237

Open
wants to merge 23 commits into
base: feature/perf
Choose a base branch
from
Open

Conversation

edwintorok
Copy link
Contributor

No description provided.

psafont and others added 21 commits December 31, 2024 12:34
This has two advantages:
1. Always non-negative: they represent absolute differences in time
2. Forces users to define the units of time, allowing to read the time in
   minutes, when appropriate

Signed-off-by: Pau Ruiz Safont <[email protected]>
Be explicit about the file system of an update ISO. Remove dead code.

Signed-off-by: Christian Lindig <[email protected]>
The new clustering interface uses a constructor `Extended` address when
a cluster hots is trying to join a cluster. This causes problems during
upgrades as a newly upgraded host might send out the new address format
to old hosts, which do not understand this format, causing the new hosts
not able to join.

The fix would be to use a new cluster_address feature flag/pool
restrictions to control the use of the new clustering interface. This makes
sure that the new interface would only be used if all of the hosts
understand this new interface, i.e. have this feature enabled. The
cluster_address feature is controlled by v6d and is pool-wide, therefore
the new interface would only be enabled if all v6ds are updated to the
correct level, which also implies that the accompanying xapi are updated
to the correct level.

Signed-off-by: Vincent Liu <[email protected]>
The new clustering interface uses a constructor `Extended` address when
a cluster hots is trying to join a cluster. This causes problems during
upgrades as a newly upgraded host might send out the new address format
to old hosts, which do not understand this format, causing the new hosts
not able to join.

The fix would be to use the cluster_health feature flag/pool
restrictions to gate the use of the new clustering interface. This makes
sure that the new interface would only be used if all of the hosts
understand this new interface, i.e. have this feature enabled. The
cluster_health feature is controlled by v6d and is pool-wide, therefore
the new interface would only be enabled if all v6ds are updated to the
correct level, which also implies that the accompanying xapi are updated
to the correct level.
Be explicit about the file system of an update ISO. Remove dead code.
This has two advantages:
1. Always non-negative: they represent absolute differences in time
2. Forces users to define the units of time, allowing to read the time
in minutes, when appropriate

Passed the toolstack's smoke and validation tests: Job 209833
The package dune has been replaced with ocaml-dune

Signed-off-by: Pau Ruiz Safont <[email protected]>
The package dune has been replaced with ocaml-dune

> Package dune is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
  ocaml-dune
Ubuntu's dune is way too old for the version generally used. On top of that the
command doesn't fail when this happens, making the setup brittle. Instead write
the version variable to config.mk and run make.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Ubuntu's dune is way too old for the version generally used. On top of
that the command doesn't fail when this happens, making the setup
brittle. Instead write the version variable to config.mk and run make.

You can see this change running as expected in this run:
https://github.com/xapi-project/xen-api/actions/runs/12792091034/job/35661725368
We have so far hard-coded the exectation that in a PEM file a private
key is followed by a certficate but this is actually not required by the
PEM standard and let to a failure in XSI-1781.

This is a simple fix first collects all keys and certificates while
skipping over other content and the uses the first key and certificate.

Signed-off-by: Christian Lindig <[email protected]>
Add back a unit test.

Signed-off-by: Christian Lindig <[email protected]>
The integer values that OCaml uses for signals should never be printed as they
are. They can cause confusion because they don't match the C POSIX values.

Change the unixext function that converts them to string to stop building a
list and finding a value in the list to instead use pattern-matching. Also
added some more values that got introduced in OCaml 4.03, and return a more
compact value for unknown signals, following the same format as Fmt.Dump.signal

Signed-off-by: Pau Ruiz Safont <[email protected]>
When signals are are written to logs, the POSIX name should be used to minimize
confusion. It makes sense that the function that does this is in the logging
library instead of the unix one, as most users will be already be using the
logging library, but not all the unix one.

Moving it there also allows for a more ergonomic usage with the logging
functions.

Signed-off-by: Pau Ruiz Safont <[email protected]>
We have so far hard-coded the expectation that in a PEM file a private
key is followed by a certificate but this is actually not required by
the PEM standard and let to a failure in XSI-1781.

This is a simple fix that permits both. However, it would still fail if
we have a certificate with multiple certificates followed by a key.
Another strategy could be to parse all blocks in a PEM file and to pick
out a key and certificate later rather than expecting a particular order
up front.

Added new test data for unit tests; fail-01.pem now becomes a passing
test case.
…es correctly

Other unit tests only verify the interoperability of the RRDs - dumping them to
JSON/XML and reading back in, verifying that the same data was decoded. We're
now seeing a problem where Gauge data sources, which should be absolute values
provided by the plugin, fluctuate wildly when processed by the RRD library.

Ensure we have an easy way to test this for both Gauge and Absolute data
sources - these values should be passed as-is by the RRD library, without any
time-based transformations.

This test currently fails and will be passing with the fix commits.

Signed-off-by: Andrii Sultanov <[email protected]>
The integer values that OCaml uses for signals should never be printed
as integers. They can cause confusion because they don't match the C
POSIX values.

Change the unixext function that converts them to string to stop
building a list and finding a value in the list to instead use
pattern-matching. Also added some more values that got introduced in
OCaml 4.03, and return a more compact value for unknown signals,
following the same format as Fmt.Dump.signal.

Typically, engineers see signal -11 and assume it's SIGSEGV, when it's
SIGTERM.

Fixes #6225
Some recent changes related to RRDs likely exposed a long-standing latent issue
where the RRD library would process the passed-in values for Gauge and Absolute
data sources incorrectly leading to constant values changing from update to
update, for example:
```
$ rrd2csv memory_total_kib
timestamp, AVERAGE:host:8b533333-91e1-4698-bd17-95b9732ffbb6:memory_total_kib
2025-01-15T08:41:40Z, 33351000
2025-01-15T08:41:45Z, 33350000
2025-01-15T08:41:50Z, 33346000
2025-01-15T08:41:55Z, 33352000
```

Instead of treating Gauge and Absolute data sources as a
variation on the rate-based Derive data source type, expecting time-based
calculations to cancel each other out, do not undertake any calculations on
non-rate data sources at all.

This makes the unit test added in the previous commit pass.

Signed-off-by: Andrii Sultanov <[email protected]>
…sources (#6233)

Some recent changes related to RRDs likely exposed a long-standing
latent issue
where the RRD library would process the passed-in values for Gauge and
Absolute
data sources incorrectly leading to constant values changing from update
to
update, for example:
```
$ rrd2csv memory_total_kib
timestamp, AVERAGE:host:8b533333-91e1-4698-bd17-95b9732ffbb6:memory_total_kib
2025-01-15T08:41:40Z, 33351000
2025-01-15T08:41:45Z, 33350000
2025-01-15T08:41:50Z, 33346000
2025-01-15T08:41:55Z, 33352000
```

Instead of treating Gauge and Absolute data sources as a
variation on the rate-based Derive data source type, expecting
time-based
calculations to cancel each other out, do not undertake any calculations
on
non-rate data sources at all.

First commit adds a failing unit test, second makes it pass.

===

I've verified these changes through manual testing, they've also passed
the testcases that discovered this issue: SNMP memory testcases (JobIDs
4197305, 4196759, 4196744) and ShimMemory testcase (4197050). This
branch also passed Ring3 BST+BVT (210577)
@edwintorok edwintorok requested a review from mg12 January 17, 2025 12:11
last-genius and others added 2 commits January 17, 2025 14:18
lastupdate field in the XML RRD blob file, in particular, was getting truncated
floats representing the number of seconds, which loses A LOT of precision,
meaning RRDs could not be checked to have been produced with the 5-second
frequency.

Signed-off-by: Andrii Sultanov <[email protected]>
…ngs (#6238)

`lastupdate` field in the XML RRD blob file, in particular, was getting
truncated floats representing the number of seconds, which loses A LOT
of precision, meaning RRDs could not be checked to have been produced
with the 5-second frequency.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants