Hyper log log plus plus(HLL++) #2522

res-life · 2024-10-21T12:45:50Z

Add support for Hyper log log plus plus(HLL++)

Depends on:

Signed-off-by: Chong Gao [email protected]

src/main/cpp/src/HLLPP.cu

ttnghia · 2024-11-01T03:16:48Z

src/main/cpp/src/HLLPP.cu

+  auto input_cols = std::vector<int64_t const*>(input_iter, input_iter + input.num_children());
+  auto d_inputs   = cudf::detail::make_device_uvector_async(input_cols, stream, mr);
+  auto result     = cudf::make_numeric_column(
+    cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);


Do we need such all-valid null mask? How about cudf::mask_state::UNALLOCATED?

Tested Spark behavior, for approx_count_distinct(null) returns 0.
So the values in result column are always non-null

I meant, if all rows are valid, we don't need to allocate a null mask.
BTW, we need to pass mr to the returning column (but do not pass it to the intermediate vector/column).

Suggested change

cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);

cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::UNALLOCATED, stream, mr);

ttnghia · 2024-11-01T03:17:16Z

src/main/cpp/src/HLLPP.cu

+  auto result     = cudf::make_numeric_column(
+    cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);
+  // evaluate from struct<long, ..., long>
+  thrust::for_each_n(rmm::exec_policy(stream),


Try to use exec_policy_nosync as much as possible.

Suggested change

thrust::for_each_n(rmm::exec_policy(stream),

thrust::for_each_n(rmm::exec_policy_nosync(stream),

ttnghia · 2024-11-01T03:19:15Z

src/main/java/com/nvidia/spark/rapids/jni/HLLPP.java

+   * The input sketch values must be given in the format `LIST<INT8>`.
+   *
+   * @param input         The sketch column which constains `LIST<INT8> values.


INT8 or INT64?

In addition, in estimate_from_hll_sketches I see that the input is STRUCT<LONG, LONG, ....> instead of LIST<>. Why?

It's STRUCT<LONG, LONG, ....> consistent with Spark. The input is columnar data, e.g.: sketch 0 is composed of by all the data of the children at index 0.
Updated the function comments, refer to commit

Signed-off-by: Chong Gao <[email protected]>

res-life · 2024-11-26T10:38:25Z

Ready to review except test cases.

src/main/cpp/CMakeLists.txt

src/main/cpp/src/HLLPP.cu

res-life · 2025-01-15T13:50:48Z

build

res-life · 2025-01-15T13:51:28Z

TODO: Add test cases

Java cases
CPP cases

ttnghia · 2025-01-15T17:58:24Z

src/main/cpp/src/HyperLogLogPlusPlusHostUDFJni.cpp

+  JNIEnv* env, jclass class_object, jlong ptr)
+{
+  try {
+    cudf::jni::auto_set_device(env);


Suggested change

cudf::jni::auto_set_device(env);

ttnghia · 2025-01-15T18:00:08Z

src/main/cpp/src/HyperLogLogPlusPlusHostUDFJni.cpp

+                                                                               int precision)
+{
+  try {
+    cudf::jni::auto_set_device(env);


I don't think we will need to set device when creating the UDF instance, as we don't start the computation yet.

Suggested change

cudf::jni::auto_set_device(env);

Not sure about this.
If it's removed, then there is no set device any more in this file, in general, there should be at least one set device.

ttnghia · 2025-01-15T18:01:18Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+#include "hyper_log_log_plus_plus.hpp"
+#include "hyper_log_log_plus_plus_const.hpp"
+#include "hyper_log_log_plus_plus_host_udf.hpp"
+
+#include <cudf/aggregation.hpp>
+#include <cudf/aggregation/host_udf.hpp>
+#include <cudf/column/column_device_view.cuh>
+#include <cudf/column/column_factories.hpp>
+#include <cudf/detail/aggregation/aggregation.hpp>
+#include <cudf/detail/iterator.cuh>
+#include <cudf/detail/valid_if.cuh>
+#include <cudf/groupby.hpp>
+#include <cudf/reduction.hpp>
+#include <cudf/scalar/scalar_factories.hpp>
+#include <cudf/utilities/type_dispatcher.hpp>
+
+#include <rmm/device_uvector.hpp>
+#include <rmm/exec_policy.hpp>
+
+#include <thrust/iterator/counting_iterator.h>
+#include <thrust/iterator/zip_iterator.h>
+#include <thrust/transform.h>
+#include <thrust/transform_reduce.h>


There are a lot of unused headers here.

ttnghia · 2025-01-15T18:07:07Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+#include "hyper_log_log_plus_plus.hpp"
+#include "hyper_log_log_plus_plus_const.hpp"
+
+#include <cudf/column/column.hpp>


Please make sure there are no unused headers.

ttnghia · 2025-01-15T18:09:07Z

src/main/cpp/src/hyper_log_log_plus_plus.hpp

+  rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref());
+
+/**
+ * Estimate count distinct values for the input which contains


Contain what?

ttnghia · 2025-01-15T18:10:06Z

src/main/cpp/src/hyper_log_log_plus_plus.hpp

+/**
+ * Compute hash codes for the input, generate HyperLogLogPlusPlus(HLLPP)
+ * sketches from hash codes, and merge the sketches in the same group. Output is
+ * a struct column with multiple long columns which is consistent with Spark.
+ */


Nit: Add @brief to C++ docs in this file.

ttnghia · 2025-01-15T18:13:19Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+
+namespace {
+
+struct hllpp_agg_udf : cudf::groupby_host_udf {


Since we have hllpp_reduce_udf, let's call this hllp_groupby_udf.

Suggested change

struct hllpp_agg_udf : cudf::groupby_host_udf {

struct hllp_groupby_udf : cudf::groupby_host_udf {

ttnghia · 2025-01-15T18:14:30Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+    rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const override
+  {
+    // groupby
+    auto const& group_values = get_grouped_values();


The get_ functions return rvalue thus we should not bind the output to a reference.

Suggested change

auto const& group_values = get_grouped_values();

auto const group_values = get_grouped_values();

ttnghia · 2025-01-15T18:14:38Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+  [[nodiscard]] std::unique_ptr<cudf::column> operator()(
+    rmm::cuda_stream_view stream, rmm::device_async_resource_ref mr) const override
+  {
+    // groupby


Suggested change

// groupby

ttnghia · 2025-01-15T18:15:32Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+  hllpp_agg_udf(int precision_, bool is_merge_) : precision(precision_), is_merge(is_merge_) {}
+
+  /**
+   * Perform the main groupby computation for HLLPP UDF


Suggested change

* Perform the main groupby computation for HLLPP UDF

* @brief Perform the main groupby computation for HLLPP UDF.

ttnghia · 2025-01-15T18:17:01Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+  int precision;
+  bool is_merge;


Suggested change

int precision;

bool is_merge;

private:

int precision;

bool is_merge;

ttnghia · 2025-01-15T18:17:16Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+  hllpp_reduct_udf(int precision_, bool is_merge_) : precision(precision_), is_merge(is_merge_) {}
+
+  /**
+   * Perform the main reduce computation for HLLPP UDF


Suggested change

* Perform the main reduce computation for HLLPP UDF

* @brief Perform the main reduce computation for HLLPP UDF.

ttnghia · 2025-01-15T18:17:49Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+    CUDF_EXPECTS(input.size() > 0,
+                 "Hyper Log Log Plus Plus reduction requires input is not empty!");


Why? Can we return an empty output instead?

done, add a function get_empty_scalar

ttnghia · 2025-01-15T18:18:18Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+
+  int precision;
+  bool is_merge;


Suggested change

int precision;

bool is_merge;

private:

int precision;

bool is_merge;

ttnghia · 2025-01-15T18:20:50Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.hpp

+std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_host_udf(int precision);
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_merge_host_udf(int precision);
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_host_udf(int precision);
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_merge_host_udf(int precision);


Humn, I think these functions can return the raw pointer directly, because we will call .release() on them immediately in the JNI function. Doing this will reduce the overhead of using smart pointers.

Suggested change

std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_host_udf(int precision);

std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_merge_host_udf(int precision);

std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_host_udf(int precision);

std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_merge_host_udf(int precision);

cudf::host_udf_base* create_hllpp_reduction_host_udf(int precision);

cudf::host_udf_base* create_hllpp_reduction_merge_host_udf(int precision);

cudf::host_udf_base* create_hllpp_groupby_host_udf(int precision);

cudf::host_udf_base* create_hllpp_groupby_merge_host_udf(int precision);

ttnghia · 2025-01-15T18:22:33Z

src/main/cpp/src/hyper_log_log_plus_plus_host_udf.cu

+std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_host_udf(int precision)
+{
+  return std::make_unique<hllpp_reduct_udf>(precision, /*is_merge*/ false);
+}
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_merge_host_udf(int precision)
+{
+  return std::make_unique<hllpp_reduct_udf>(precision, /*is_merge*/ true);
+}
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_host_udf(int precision)
+{
+  return std::make_unique<hllpp_agg_udf>(precision, /*is_merge*/ false);
+}
+
+std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_merge_host_udf(int precision)
+{
+  return std::make_unique<hllpp_agg_udf>(precision, /*is_merge*/ true);
+}


Suggested change

std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_host_udf(int precision)

{

return std::make_unique<hllpp_reduct_udf>(precision, /*is_merge*/ false);

}

std::unique_ptr<cudf::host_udf_base> create_hllpp_reduction_merge_host_udf(int precision)

{

return std::make_unique<hllpp_reduct_udf>(precision, /*is_merge*/ true);

}

std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_host_udf(int precision)

{

return std::make_unique<hllpp_agg_udf>(precision, /*is_merge*/ false);

}

std::unique_ptr<cudf::host_udf_base> create_hllpp_groupby_merge_host_udf(int precision)

{

return std::make_unique<hllpp_agg_udf>(precision, /*is_merge*/ true);

}

cudf::host_udf_base* create_hllpp_reduction_host_udf(int precision)

{

return new hllpp_reduct_udf(precision, /*is_merge*/ false);

}

cudf::host_udf_base* create_hllpp_reduction_merge_host_udf(int precision)

{

return new hllpp_reduct_udf(precision, /*is_merge*/ true);

}

cudf::host_udf_base* create_hllpp_groupby_host_udf(int precision)

{

return new hllpp_agg_udf(precision, /*is_merge*/ false);

}

cudf::host_udf_base* create_hllpp_groupby_merge_host_udf(int precision)

{

return new hllpp_agg_udf(precision, /*is_merge*/ true);

}

ttnghia · 2025-01-15T18:23:23Z

src/main/cpp/src/HyperLogLogPlusPlusHostUDFJni.cpp

+    }();
+    CUDF_EXPECTS(udf_ptr != nullptr, "Invalid HyperLogLogPlusPlus(HLLPP) UDF instance.");
+
+    return reinterpret_cast<jlong>(udf_ptr.release());


udf_ptr is a raw pointer.

Suggested change

return reinterpret_cast<jlong>(udf_ptr.release());

return reinterpret_cast<jlong>(udf_ptr);

ttnghia · 2025-01-15T18:32:12Z

src/main/java/com/nvidia/spark/rapids/jni/HyperLogLogPlusPlusHostUDF.java

+  /**
+   * TODO: move this to cuDF HostUDFWrapper
+   */
+  @Override
+  public void close() throws Exception {
+    close(udfNativeHandle);
+  }
+
+  /**
+   * TODO: move this to cuDF HostUDFWrapper
+   */
+  static native void close(long ptr);


Oh moving these function to cudf JNI is possible. However, HOST UDF implementations can vary significantly, and some of them may have extra data members to close which is beyond the native UDF pointer. Thus, I think we should better to have the close() method implemented here (in the derived class) instead of in the base HostUDFWrapper class.

OK to me, but I have concerns:
If we have more derived classes in future, every derived classes should have a JNI close implemetation.
udfNativeHandle is in HostUDFWrapper, seems it's reasonable to close udfNativeHandle in HostUDFWrapper. But udfNativeHandle is created in the derived class. I'm not sure which place is better.

ttnghia · 2025-01-15T18:36:29Z

src/main/java/com/nvidia/spark/rapids/jni/HyperLogLogPlusPlusHostUDF.java

+     * sketch. Input is a struct column with multiple long columns which is
+     * consistent with Spark. Output is a struct scalar with multiple long values.
+     */
+    ReductionMERGE(1),


Suggested change

ReductionMERGE(1),

ReductionMerge(1),

ttnghia · 2025-01-15T18:37:36Z

src/main/java/com/nvidia/spark/rapids/jni/HyperLogLogPlusPlusHostUDF.java

+   * The value of num_registers_per_sketch = 2^precision
+   * The children num of this Struct is: num_registers_per_sketch / 10 + 1,
+   * Here 10 means a INT64 contains 10 register values,
+   * each register value is 6 bits.
+   * Register value is the number of leading zero bits in xxhash64 hash code.
+   * xxhash64 hash code is 64 bits, Register value is 6 bits,
+   * 6 bits is enough to hold the max value 64.
+   *
+   * @param input     The sketch column which constains Struct<INT64, INT64, ...>
+   *                  values.
+   * @param precision The num of bits for HLLPP register addressing.
+   * @return A INT64 column with each value indicates the approximate count
+   *         distinct value.


Need to reformat + polish docs.

ttnghia · 2025-01-15T18:42:18Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+ * `reduce_by_key` uses num_rows_input intermidate cache:
+ * https://github.com/NVIDIA/thrust/blob/2.1.0/thrust/system/detail/generic/reduce_by_key.inl#L112
+ *
+ * // scan the values by flag
+ * thrust::detail::temporary_array<ValueType,ExecutionPolicy>
+ * scanned_values(exec, n);


What do these APIs affect us?

Tried to use reduce_by_key, but it uses too much of memory, so give up using reduce_by_key.
And updated the comments to:

* Tried to use `reduce_by_key`, but it uses too much of memory, so give up using `reduce_by_key`. * More details: * `reduce_by_key` uses num_rows_input intermidate cache: * https://github.com/NVIDIA/thrust/blob/2.1.0/thrust/system/detail/generic/reduce_by_key.inl#L112 * // scan the values by flag * thrust::detail::temporary_array<ValueType,ExecutionPolicy> * scanned_values(exec, n); * Each sketch contains multiple integers, by default 512 integers(precision is * 9), num_rows_input * 512 is huge.

ttnghia · 2025-01-15T18:54:03Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+  auto d_results = [&] {
+    auto host_results_pointer_iter =
+      thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
+        return results_column->mutable_view().template data<int64_t>();
+      });
+    auto host_results_pointers =
+      std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
+    return cudf::detail::make_device_uvector_async(
+      host_results_pointers, stream, cudf::get_current_device_resource_ref());
+  }();


host_results_pointers can be destroyed before data is copied to device. Since we don't like stream sync, just move it out of the code block:

Suggested change

auto d_results = [&] {

auto host_results_pointer_iter =

thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {

return results_column->mutable_view().template data<int64_t>();

});

auto host_results_pointers =

std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());

return cudf::detail::make_device_uvector_async(

host_results_pointers, stream, cudf::get_current_device_resource_ref());

}();

auto host_results_pointer_iter =

thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {

return results_column->mutable_view().template data<int64_t>();

});

auto host_results_pointers =

std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());

auto d_results = cudf::detail::make_device_uvector_async(

host_results_pointers, stream, cudf::get_current_device_resource_ref());

ttnghia · 2025-01-15T18:54:56Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+  auto d_results = [&] {
+    auto host_results_pointer_iter =
+      thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
+        return results_column->mutable_view().template data<int64_t>();
+      });
+    auto host_results_pointers =
+      std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
+    return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
+  }();


Similarly, host_results_pointers can be destroyed before data is copied to device. Just remove this code block.

Suggested change

auto d_results = [&] {

auto host_results_pointer_iter =

thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {

return results_column->mutable_view().template data<int64_t>();

});

auto host_results_pointers =

std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());

return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);

}();

auto host_results_pointer_iter =

thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {

return results_column->mutable_view().template data<int64_t>();

});

auto host_results_pointers =

std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());

auto d_results = cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);

ttnghia · 2025-01-15T18:56:05Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+  auto d_sketches_output = [&] {
+    auto host_results_pointer_iter =
+      thrust::make_transform_iterator(results.begin(), [](auto const& results_column) {
+        return results_column->mutable_view().template data<int64_t>();
+      });
+    auto host_results_pointers =
+      std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + results.size());
+    return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
+  }();


Similarly, host_results_pointers can be destroyed before data is copied to device. Move code out of block.

ttnghia · 2025-01-15T18:56:38Z

src/main/cpp/src/hyper_log_log_plus_plus.cu

+  auto d_results = [&] {
+    auto host_results_pointer_iter =
+      thrust::make_transform_iterator(children.begin(), [](auto const& results_column) {
+        return results_column->mutable_view().template data<int64_t>();
+      });
+    auto host_results_pointers =
+      std::vector<int64_t*>(host_results_pointer_iter, host_results_pointer_iter + children.size());
+    return cudf::detail::make_device_uvector_async(host_results_pointers, stream, mr);
+  }();


Similar issue when host_results_pointers is destroyed before data is copied to device.

res-life requested a review from ttnghia October 21, 2024 12:45

res-life force-pushed the hll branch 3 times, most recently from b6f5cf5 to 526a61f Compare October 31, 2024 11:34

res-life changed the title ~~[Do not review] Hyper log log plus plus(HLL++)~~ Hyper log log plus plus(HLL++) Oct 31, 2024

res-life force-pushed the hll branch from 526a61f to b7abf6e Compare October 31, 2024 12:47

ttnghia reviewed Nov 1, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Nov 1, 2024

View reviewed changes

Chong Gao added 3 commits November 21, 2024 13:26

Add HLL++ evaluation function

03c0f5a

Update function comments

df8b223

Fix

2daca3f

res-life force-pushed the hll branch from 11e97a9 to 2daca3f Compare November 21, 2024 07:33

res-life changed the base branch from branch-24.12 to branch-25.02 November 25, 2024 09:53

res-life mentioned this pull request Nov 26, 2024

[WIP] Add support for Hyper Log Log PLus Plus(HLL++) NVIDIA/spark-rapids#11638

Draft

Chong Gao added 2 commits November 26, 2024 15:43

Use exec_policy_nosync instead of exec_policy

3afdfde

Format code; Remove a useless file

956af39

Signed-off-by: Chong Gao <[email protected]>

res-life force-pushed the hll branch from c7da8ed to 956af39 Compare November 26, 2024 07:51

Merge branch 'branch-25.02' into hll

8aaf0f6

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/CMakeLists.txt Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/CMakeLists.txt Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Dec 13, 2024

View reviewed changes

src/main/cpp/src/HLLPP.cu Outdated Show resolved Hide resolved

ttnghia reviewed Jan 15, 2025

View reviewed changes

Chong Gao added 3 commits January 16, 2025 13:51

Address comments

86832ae

Address comments

b331db9

Address comments

7b929b4

	cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream);
	cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::UNALLOCATED, stream, mr);

	thrust::for_each_n(rmm::exec_policy(stream),
	thrust::for_each_n(rmm::exec_policy_nosync(stream),

	struct hllpp_agg_udf : cudf::groupby_host_udf {
	struct hllp_groupby_udf : cudf::groupby_host_udf {

	auto const& group_values = get_grouped_values();
	auto const group_values = get_grouped_values();

	* Perform the main groupby computation for HLLPP UDF
	* @brief Perform the main groupby computation for HLLPP UDF.

	* Perform the main reduce computation for HLLPP UDF
	* @brief Perform the main reduce computation for HLLPP UDF.

		CUDF_EXPECTS(input.size() > 0,
		"Hyper Log Log Plus Plus reduction requires input is not empty!");

	return reinterpret_cast<jlong>(udf_ptr.release());
	return reinterpret_cast<jlong>(udf_ptr);

Hyper log log plus plus(HLL++) #2522

Are you sure you want to change the base?

Hyper log log plus plus(HLL++) #2522

Conversation

res-life commented Oct 21, 2024 • edited by ttnghia Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

res-life commented Nov 26, 2024

res-life commented Jan 15, 2025

res-life commented Jan 15, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ttnghia Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

res-life commented Oct 21, 2024 •

edited by ttnghia

Loading

ttnghia Nov 1, 2024 •

edited

Loading

ttnghia Jan 15, 2025 •

edited

Loading

ttnghia Jan 15, 2025 •

edited

Loading

ttnghia Jan 15, 2025 •

edited

Loading