Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gauss-mix validation #448

Merged

Conversation

lovisek
Copy link
Contributor

@lovisek lovisek commented Jul 17, 2024

The benchmark computation was originally unstable due to the method of generating training data points. The benchmark generated random points such that each dimension of each point was a random double between zero and one, rounded to one decimal place. This approach resulted in points representing random noise within a cuboid defined by [0, 0, ..., 0] and [1, 1, ..., 1], rather than Gaussian distributions. Additionally, the Spark GMM model fitting returns inconsistent results when different thread counts are used, making it impossible to achieve benchmark stability even with a fixed seed for the random generator.

Currently, the benchmark generates input data points grouped into clusters. These clusters are positioned with increasing offset from the vertices of a hypercube. To enhance model fit quality, the clusters have varying deviations. After generation, the points are split into three sets: training, validation, and test. The training data points are used to train multiple models, each with different training parameters: iteration count, seed, and K.

Each model is trained with a different seed to explore variations in model initialization, which can lead to better fits. To ensure consistency, the benchmark uses an initial seed and increments it by one for each subsequent trained model.

The K parameter represents the number of Gaussian clusters the model should fit. In the benchmark, K is set to either 1.5 or 2.0 times the actual number of generated centroids. Using a higher K improves the model fit quality.

After training models with all defined configurations, the best model is selected. For each model, cluster inclusion for all points in the validation set is predicted, and the prediction accuracy is computed. The distance between the expected and correctly predicted Gaussian distribution mean (mu) should be 0.25 or less. The best model is then used to predict the points in the test set, and its prediction accuracy is computed. The benchmark validation requires the prediction accuracy of the best model on the test set to exceed 99%.

I believe the duration of the gauss-mix benchmark will vary due to modifications in its computation. Specifically, I reduced the point count and dimension count while increasing the number of trained models from 1 to 8 to enhance validation stability. To document the change in benchmark duration, I measured 15 runs (each with 150 repetitions) using JDK21 on a 6-cores (12-threads) system, both before and after the modifications. During these measurements, I included JVM warmup and did not filter outliers. The collected measurements are visualized in the graphs bellow:

Samples graph of the gauss-mix benchmark duration before and after validation.
Histograms graph of the gauss-mix benchmark duration before and after validation.
Violins graph of the gauss-mix benchmark duration before and after validation.

The average benchmark duration before validation was: 532.923ms
The average benchmark duration after validation is: 4608.173ms

@lbulej lbulej marked this pull request as ready for review August 1, 2024 13:54
@lbulej
Copy link
Member

lbulej commented Aug 2, 2024

It would be nice if @axel22 , @farquet or @ceresek found some time to have a look. I exclude mysself, because I was involved behind the scenes already, but maybe we (me and @lovisek) missed something :-)

Copy link
Collaborator

@farquet farquet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice rework and analysis!

This makes the benchmark more reasonable and reliable for the future.

@lbulej lbulej merged commit 7b33d3b into renaissance-benchmarks:master Aug 7, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants