Skip to content

Commit

Permalink
Merge branch 'bug_biased_sampling' of https://github.com/seunghwak/cu…
Browse files Browse the repository at this point in the history
…graph into biased-dgl
  • Loading branch information
alexbarghi-nv committed Aug 9, 2024
2 parents 49470ff + 9d566e8 commit d0c6920
Show file tree
Hide file tree
Showing 5 changed files with 301 additions and 115 deletions.
85 changes: 85 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,88 @@
# cugraph 24.08.00 (7 Aug 2024)

## 🚨 Breaking Changes

- Use MNMG version of ECG in python layer instead, and remove legacy ECG and Louvain ([#4514](https://github.com/rapidsai/cugraph/pull/4514)) [@naimnv](https://github.com/naimnv)

## 🐛 Bug Fixes

- add setuptools to host requirements for conda packages that need it ([#4582](https://github.com/rapidsai/cugraph/pull/4582)) [@jameslamb](https://github.com/jameslamb)
- Add pylibcugraph dependency on pylibraft. ([#4570](https://github.com/rapidsai/cugraph/pull/4570)) [@bdice](https://github.com/bdice)
- Fix build error with NO_CUGRAPH_OPS ([#4563](https://github.com/rapidsai/cugraph/pull/4563)) [@seunghwak](https://github.com/seunghwak)
- [BUG] Fix Failing WholeGraph Tests ([#4560](https://github.com/rapidsai/cugraph/pull/4560)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Temporarily Disable Feature Store Tests with WholeGraph ([#4559](https://github.com/rapidsai/cugraph/pull/4559)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Fix MG `katz_centrality`: Check if DataFrame Arg is Not None ([#4555](https://github.com/rapidsai/cugraph/pull/4555)) [@nv-rliu](https://github.com/nv-rliu)
- nx-cugraph: fix `from_pandas_edgekey` given edgekey but not edgeattr ([#4550](https://github.com/rapidsai/cugraph/pull/4550)) [@eriknw](https://github.com/eriknw)
- Fix triangle count test bug ([#4549](https://github.com/rapidsai/cugraph/pull/4549)) [@jnke2016](https://github.com/jnke2016)
- [BUG] Use the Correct WG Communicator ([#4548](https://github.com/rapidsai/cugraph/pull/4548)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Add Additional Check For SSSP Source Vertex & Fix SSSP Benchmark ([#4541](https://github.com/rapidsai/cugraph/pull/4541)) [@nv-rliu](https://github.com/nv-rliu)
- Fix OOM Bug for Jaccard, Sorensen, and Overlap benchmarks ([#4524](https://github.com/rapidsai/cugraph/pull/4524)) [@nv-rliu](https://github.com/nv-rliu)
- Distribute start_list across ranks ([#4519](https://github.com/rapidsai/cugraph/pull/4519)) [@jnke2016](https://github.com/jnke2016)
- [FIX] Skip Distributed Sampler Tests if PyTorch with CUDA is not Available ([#4518](https://github.com/rapidsai/cugraph/pull/4518)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- [BUG] Fix a hang issue in MG triangle counts (when invoked with a small number of vertices to update triangle counts) ([#4517](https://github.com/rapidsai/cugraph/pull/4517)) [@seunghwak](https://github.com/seunghwak)
- Update MG Benchmark List ([#4516](https://github.com/rapidsai/cugraph/pull/4516)) [@nv-rliu](https://github.com/nv-rliu)
- Fix TensorProductConv test and improve docs ([#4480](https://github.com/rapidsai/cugraph/pull/4480)) [@tingyu66](https://github.com/tingyu66)
- Test nx-cugraph package instead of editable install ([#4442](https://github.com/rapidsai/cugraph/pull/4442)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)

## 📖 Documentation

- DOC: typo in nx_transition.rst ([#4491](https://github.com/rapidsai/cugraph/pull/4491)) [@raybellwaves](https://github.com/raybellwaves)
- Doc cleanup for nx-cugraph: fixed typos, cleaned up various descriptions, renamed notebook to match naming convetion. ([#4478](https://github.com/rapidsai/cugraph/pull/4478)) [@rlratzel](https://github.com/rlratzel)
- [DOC] Minor Improvements to cuGraph-PyG Documentation ([#4460](https://github.com/rapidsai/cugraph/pull/4460)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)

## 🚀 New Features

- Use MNMG version of ECG in python layer instead, and remove legacy ECG and Louvain ([#4514](https://github.com/rapidsai/cugraph/pull/4514)) [@naimnv](https://github.com/naimnv)
- c_api and plc binding for lookup src dst using edge ids and type(s) ([#4494](https://github.com/rapidsai/cugraph/pull/4494)) [@naimnv](https://github.com/naimnv)
- Forward merge branch-24.06 into branch-24.08 ([#4489](https://github.com/rapidsai/cugraph/pull/4489)) [@nv-rliu](https://github.com/nv-rliu)
- [FEA] New Graph Interface and Loaders for Distributed Sampling in DGL ([#4486](https://github.com/rapidsai/cugraph/pull/4486)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- compute cosine similarity for vertex pairs ([#4482](https://github.com/rapidsai/cugraph/pull/4482)) [@naimnv](https://github.com/naimnv)
- Define heterogeneous renumbering API ([#4463](https://github.com/rapidsai/cugraph/pull/4463)) [@seunghwak](https://github.com/seunghwak)
- Lookup edge src dst using edge id and type ([#4449](https://github.com/rapidsai/cugraph/pull/4449)) [@naimnv](https://github.com/naimnv)
- Biased sampling ([#4443](https://github.com/rapidsai/cugraph/pull/4443)) [@seunghwak](https://github.com/seunghwak)

## 🛠️ Improvements

- nx-cugraph: check networkx version ([#4571](https://github.com/rapidsai/cugraph/pull/4571)) [@eriknw](https://github.com/eriknw)
- nx-cugraph: add `G.__networkx_cache__` to enable graph conversion caching ([#4567](https://github.com/rapidsai/cugraph/pull/4567)) [@eriknw](https://github.com/eriknw)
- split up CUDA-suffixed dependencies in dependencies.yaml ([#4552](https://github.com/rapidsai/cugraph/pull/4552)) [@jameslamb](https://github.com/jameslamb)
- Use workflow branch 24.08 again ([#4544](https://github.com/rapidsai/cugraph/pull/4544)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Support non p2p configuration when initializing the comms ([#4543](https://github.com/rapidsai/cugraph/pull/4543)) [@jnke2016](https://github.com/jnke2016)
- Fix Warning from `simpleDistributedGraph.py` ([#4540](https://github.com/rapidsai/cugraph/pull/4540)) [@nv-rliu](https://github.com/nv-rliu)
- Create a graph from the edge list in multiple chunks ([#4539](https://github.com/rapidsai/cugraph/pull/4539)) [@seunghwak](https://github.com/seunghwak)
- nx-cugraph: add dijkstra sssp functions ([#4538](https://github.com/rapidsai/cugraph/pull/4538)) [@eriknw](https://github.com/eriknw)
- nx-cugraph: add `from_dict_of_lists` and `to_dict_of_lists` ([#4537](https://github.com/rapidsai/cugraph/pull/4537)) [@eriknw](https://github.com/eriknw)
- Ensure `get_test_data.sh` doesn't re-download datasets ([#4536](https://github.com/rapidsai/cugraph/pull/4536)) [@trxcllnt](https://github.com/trxcllnt)
- Define and Implement C API for biased sampling ([#4535](https://github.com/rapidsai/cugraph/pull/4535)) [@ChuckHastings](https://github.com/ChuckHastings)
- Build and test with CUDA 12.5.1 ([#4534](https://github.com/rapidsai/cugraph/pull/4534)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Refactor C++ unit tests to allow finer grained filtering ([#4533](https://github.com/rapidsai/cugraph/pull/4533)) [@ChuckHastings](https://github.com/ChuckHastings)
- [IMP] Set the Default WG Memory Type to 'distributed' for the MNMG PyG Example ([#4532](https://github.com/rapidsai/cugraph/pull/4532)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- nx-cugraph: add `relabel_nodes` and `convert_node_labels_to_integers` ([#4531](https://github.com/rapidsai/cugraph/pull/4531)) [@eriknw](https://github.com/eriknw)
- Add `-cuXX` suffixed versions of cugraph-service-client dependency to pyproject.toml's project.dependencies list ([#4530](https://github.com/rapidsai/cugraph/pull/4530)) [@trxcllnt](https://github.com/trxcllnt)
- Further optimize `from_pandas_edgelist` with cudf ([#4528](https://github.com/rapidsai/cugraph/pull/4528)) [@eriknw](https://github.com/eriknw)
- Performance optimize BFS (including direction optimizing BFS implementation, mainly for single-GPU) ([#4527](https://github.com/rapidsai/cugraph/pull/4527)) [@seunghwak](https://github.com/seunghwak)
- Add CUDA_STATIC_MATH_LIBRARIES ([#4526](https://github.com/rapidsai/cugraph/pull/4526)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Better handle cudf.pandas in `from_pandas_edgelist` ([#4525](https://github.com/rapidsai/cugraph/pull/4525)) [@eriknw](https://github.com/eriknw)
- Skip the benchmark ctests within CI ([#4522](https://github.com/rapidsai/cugraph/pull/4522)) [@ChuckHastings](https://github.com/ChuckHastings)
- remove thriftpy2 ceiling ([#4521](https://github.com/rapidsai/cugraph/pull/4521)) [@jameslamb](https://github.com/jameslamb)
- Avoid --find-links in wheel jobs ([#4509](https://github.com/rapidsai/cugraph/pull/4509)) [@jameslamb](https://github.com/jameslamb)
- Refactor code base to reduce memory requirement for building libcugraph ([#4506](https://github.com/rapidsai/cugraph/pull/4506)) [@naimnv](https://github.com/naimnv)
- Tweak rmm configuration for C++ unit tests ([#4503](https://github.com/rapidsai/cugraph/pull/4503)) [@ChuckHastings](https://github.com/ChuckHastings)
- Expose new all-pairs Similarity algorithms ([#4502](https://github.com/rapidsai/cugraph/pull/4502)) [@jnke2016](https://github.com/jnke2016)
- remove openmpi ceiling ([#4496](https://github.com/rapidsai/cugraph/pull/4496)) [@jameslamb](https://github.com/jameslamb)
- Cut peak memory footprint in per_v_transform_reduce_dst_key_aggregated_outgoing_e ([#4484](https://github.com/rapidsai/cugraph/pull/4484)) [@seunghwak](https://github.com/seunghwak)
- Skip MG `dgl_uniform_sampler` test in nightlies ([#4479](https://github.com/rapidsai/cugraph/pull/4479)) [@nv-rliu](https://github.com/nv-rliu)
- Remove text builds of documentation ([#4468](https://github.com/rapidsai/cugraph/pull/4468)) [@vyasr](https://github.com/vyasr)
- [IMP] Limit the Test Data Size when Running CI in `gcn_dist_sg.py` ([#4461](https://github.com/rapidsai/cugraph/pull/4461)) [@alexbarghi-nv](https://github.com/alexbarghi-nv)
- Forward Merge branch-24.06 into branch-24.08 ([#4454](https://github.com/rapidsai/cugraph/pull/4454)) [@nv-rliu](https://github.com/nv-rliu)
- Properly clean up python directories ([#4453](https://github.com/rapidsai/cugraph/pull/4453)) [@ChuckHastings](https://github.com/ChuckHastings)
- Fixes for On-Going MG Test Failures ([#4450](https://github.com/rapidsai/cugraph/pull/4450)) [@nv-rliu](https://github.com/nv-rliu)
- remove unnecessary 'setuptools' and 'wheel' dependencies ([#4448](https://github.com/rapidsai/cugraph/pull/4448)) [@jameslamb](https://github.com/jameslamb)
- MG Implementation K-Truss ([#4438](https://github.com/rapidsai/cugraph/pull/4438)) [@jnke2016](https://github.com/jnke2016)
- Overhaul ops-codeowners ([#4409](https://github.com/rapidsai/cugraph/pull/4409)) [@raydouglass](https://github.com/raydouglass)
- Use rapids-build-backend ([#4393](https://github.com/rapidsai/cugraph/pull/4393)) [@KyleFromNVIDIA](https://github.com/KyleFromNVIDIA)
- Optimize K-Truss ([#4375](https://github.com/rapidsai/cugraph/pull/4375)) [@jnke2016](https://github.com/jnke2016)

# cugraph 24.06.00 (5 Jun 2024)

## 🚨 Breaking Changes
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2041,7 +2041,7 @@ biased_sample_and_compute_local_nbr_indices(
zero_bias_frontier_indices.resize(zero_bias_count_inclusive_sums.back(),
handle.get_stream());
zero_bias_frontier_indices.shrink_to_fit(handle.get_stream());
zero_bias_local_nbr_indices.resize(frontier_indices.size(), handle.get_stream());
zero_bias_local_nbr_indices.resize(zero_bias_frontier_indices.size(), handle.get_stream());
zero_bias_local_nbr_indices.shrink_to_fit(handle.get_stream());
std::vector<size_t> zero_bias_counts(zero_bias_count_inclusive_sums.size());
std::adjacent_difference(zero_bias_count_inclusive_sums.begin(),
Expand Down
55 changes: 28 additions & 27 deletions cpp/src/prims/detail/transform_v_frontier_e.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -209,9 +209,6 @@ __global__ static void transform_v_frontier_e_mid_degree(
auto const lane_id = tid % raft::warp_size();
size_t idx = static_cast<size_t>(tid / raft::warp_size());

using WarpScan = cub::WarpScan<edge_t, raft::warp_size()>;
__shared__ typename WarpScan::TempStorage temp_storage;

while (idx < static_cast<size_t>(thrust::distance(edge_partition_frontier_key_index_first,
edge_partition_frontier_key_index_last))) {
auto key_idx = *(edge_partition_frontier_key_index_first + idx);
Expand All @@ -224,16 +221,15 @@ __global__ static void transform_v_frontier_e_mid_degree(
thrust::tie(indices, edge_offset, local_degree) = edge_partition.local_edges(major_offset);
auto this_key_value_first = value_first + edge_partition_frontier_local_degree_offsets[key_idx];
if (edge_partition_e_mask) {
// FIXME: it might be faster to update in warp-sync way
edge_t counter{0};
for (edge_t i = lane_id; i < local_degree; i += raft::warp_size()) {
if ((*edge_partition_e_mask).get(edge_offset + i)) { ++counter; }
}
edge_t offset_within_warp{};
WarpScan(temp_storage).ExclusiveSum(counter, offset_within_warp);
counter = 0;
for (edge_t i = lane_id; i < local_degree; i += raft::warp_size()) {
if ((*edge_partition_e_mask).get(edge_offset + i)) {
auto rounded_up_local_degree =
((static_cast<size_t>(local_degree) + (raft::warp_size() - 1)) / raft::warp_size()) *
raft::warp_size();
edge_t base_offset{0};
for (edge_t i = lane_id; i < rounded_up_local_degree; i += raft::warp_size()) {
auto valid = (i < local_degree) && (*edge_partition_e_mask).get(edge_offset + i);
auto ballot = __ballot_sync(raft::warp_full_mask(), valid ? uint32_t{1} : uint32_t{0});
if (valid) {
auto intra_warp_offset = __popc(ballot & ~(raft::warp_full_mask() << lane_id));
transform_v_frontier_e_update_buffer_element<key_t, GraphViewType>(
edge_partition,
key,
Expand All @@ -244,9 +240,9 @@ __global__ static void transform_v_frontier_e_mid_degree(
edge_partition_dst_value_input,
edge_partition_e_value_input,
e_op,
this_key_value_first + offset_within_warp + counter);
++counter;
this_key_value_first + base_offset + intra_warp_offset);
}
base_offset += __popc(ballot);
}
} else {
for (edge_t i = lane_id; i < local_degree; i += raft::warp_size()) {
Expand Down Expand Up @@ -300,6 +296,7 @@ __global__ static void transform_v_frontier_e_high_degree(

using BlockScan = cub::BlockScan<edge_t, transform_v_frontier_e_kernel_block_size>;
__shared__ typename BlockScan::TempStorage temp_storage;
__shared__ edge_t increment;

while (idx < static_cast<size_t>(thrust::distance(edge_partition_frontier_key_index_first,
edge_partition_frontier_key_index_last))) {
Expand All @@ -313,16 +310,16 @@ __global__ static void transform_v_frontier_e_high_degree(
thrust::tie(indices, edge_offset, local_degree) = edge_partition.local_edges(major_offset);
auto this_key_value_first = value_first + edge_partition_frontier_local_degree_offsets[key_idx];
if (edge_partition_e_mask) {
// FIXME: it might be faster to update in block-sync way
edge_t counter{0};
for (edge_t i = threadIdx.x; i < local_degree; i += blockDim.x) {
if ((*edge_partition_e_mask).get(edge_offset + i)) { ++counter; }
}
edge_t offset_within_block{};
BlockScan(temp_storage).ExclusiveSum(counter, offset_within_block);
counter = 0;
for (edge_t i = threadIdx.x; i < local_degree; i += blockDim.x) {
if ((*edge_partition_e_mask).get(edge_offset + i)) {
auto rounded_up_local_degree =
((static_cast<size_t>(local_degree) + (transform_v_frontier_e_kernel_block_size - 1)) /
transform_v_frontier_e_kernel_block_size) *
transform_v_frontier_e_kernel_block_size;
edge_t base_offset{0};
for (size_t i = threadIdx.x; i < rounded_up_local_degree; i += blockDim.x) {
auto valid = (i < local_degree) && (*edge_partition_e_mask).get(edge_offset + i);
edge_t intra_block_offset{};
BlockScan(temp_storage).ExclusiveSum(valid ? edge_t{1} : edge_t{0}, intra_block_offset);
if (valid) {
transform_v_frontier_e_update_buffer_element<key_t, GraphViewType>(
edge_partition,
key,
Expand All @@ -333,9 +330,13 @@ __global__ static void transform_v_frontier_e_high_degree(
edge_partition_dst_value_input,
edge_partition_e_value_input,
e_op,
this_key_value_first + offset_within_block + counter);
++counter;
this_key_value_first + base_offset + intra_block_offset);
}
if (threadIdx.x == transform_v_frontier_e_kernel_block_size - 1) {
increment = intra_block_offset + (valid ? edge_t{1} : edge_t{0});
}
__syncthreads();
base_offset += increment;
}
} else {
for (edge_t i = threadIdx.x; i < local_degree; i += blockDim.x) {
Expand Down
Loading

0 comments on commit d0c6920

Please sign in to comment.