Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hourglass demonstration case fails on multiple MPI processes #277

Open
stvdwtt opened this issue Apr 25, 2024 · 8 comments
Open

Hourglass demonstration case fails on multiple MPI processes #277

stvdwtt opened this issue Apr 25, 2024 · 8 comments
Assignees

Comments

@stvdwtt
Copy link
Collaborator

stvdwtt commented Apr 25, 2024

The hourglass demonstration case (link) hits a divide-by-zero error on multiple MPI processes. Evidently this has been a problem for a while and is independent of the problem in #278.

Screenshot of the error from @AshGannon:
image

@masterleinad
Copy link
Collaborator

Does it also fail when running with only one MPI process?

@Rombur
Copy link
Member

Rombur commented Apr 25, 2024

No it doesn't

@masterleinad
Copy link
Collaborator

The last deal.II function is https://github.com/dealii/dealii/blob/e9eb5ab491aab6b0e57e9b552a4e5d64e20077a6/source/base/mpi_compute_index_owner_internal.cc#L432-L454.

So owned_indices.size() is likely zero which is checked in Debug mode.

@Rombur
Copy link
Member

Rombur commented Apr 25, 2024

Yes, the error message is misleading but Ashley doesn't have a debug version of the code.

@stvdwtt
Copy link
Collaborator Author

stvdwtt commented May 3, 2024

Ashley and I sorted out the source of this problem this morning (and I helped Ashley build a debug version). The problem is that there is no substrate for the hourglass print and so at the start of the simulation there are no activated cells. This turns out to be fine in serial, but at some point for multiple MPI processes there's a division by the number of DOFs.

There is nothing wrong with the code, this is just an odd use case. My plan is to add a check so that adamantine will fail gracefully if this happens. I don't expect users to purposefully do simulations with no active elements initially, but I can see this happening accidentally (e.g. a user sets the material_height parameter incorrectly).

@Rombur
Copy link
Member

Rombur commented May 3, 2024

but at some point for multiple MPI processes there's a division by the number of DOFs.

It's probably because we partition the mesh in such a way that each processors get the same number of DOFs. Can you try to remove these lines

_cell_weights(
_dof_handler,
dealii::parallel::CellWeights<dim>::ndofs_weighting({1, 1})),

and tell me if that fixes the issue. Without this function, the partitioning will ignore the number of DOFs.
If that fixes the issue, we could check if the number of DOFs is greater than zero to decide the type of load balancing we want to do.

@AshGannon
Copy link
Collaborator

I will look into this more when I finish my SLUG talk - this is the output after commenting lines 276-278 @Rombur

image

@Rombur
Copy link
Member

Rombur commented May 4, 2024

You probably have the same issue in serial but because the checks are disabled in release mode, the code kept running. We should probably just skip the initialization if no cell is activated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants