-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hourglass demonstration case fails on multiple MPI processes #277
Comments
Does it also fail when running with only one MPI process? |
No it doesn't |
The last deal.II function is https://github.com/dealii/dealii/blob/e9eb5ab491aab6b0e57e9b552a4e5d64e20077a6/source/base/mpi_compute_index_owner_internal.cc#L432-L454. So |
Yes, the error message is misleading but Ashley doesn't have a debug version of the code. |
Ashley and I sorted out the source of this problem this morning (and I helped Ashley build a debug version). The problem is that there is no substrate for the hourglass print and so at the start of the simulation there are no activated cells. This turns out to be fine in serial, but at some point for multiple MPI processes there's a division by the number of DOFs. There is nothing wrong with the code, this is just an odd use case. My plan is to add a check so that adamantine will fail gracefully if this happens. I don't expect users to purposefully do simulations with no active elements initially, but I can see this happening accidentally (e.g. a user sets the |
It's probably because we partition the mesh in such a way that each processors get the same number of DOFs. Can you try to remove these lines adamantine/source/ThermalPhysics.templates.hh Lines 276 to 278 in 8c28e59
and tell me if that fixes the issue. Without this function, the partitioning will ignore the number of DOFs. If that fixes the issue, we could check if the number of DOFs is greater than zero to decide the type of load balancing we want to do. |
I will look into this more when I finish my SLUG talk - this is the output after commenting lines 276-278 @Rombur |
You probably have the same issue in serial but because the checks are disabled in release mode, the code kept running. We should probably just skip the initialization if no cell is activated. |
The hourglass demonstration case (link) hits a divide-by-zero error on multiple MPI processes. Evidently this has been a problem for a while and is independent of the problem in #278.
Screenshot of the error from @AshGannon:
The text was updated successfully, but these errors were encountered: