-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loads of issues resulting from use of VkFFT #129
Comments
Pytential's Mac CI on A number of @alexfikl's PRs had picked up similar failures, and this establishes that the issue is not specific to them. I conjecture that this is connected to the #114 merge. To help back up that conjecture, here's a run without that merge: inducer/pytential#175. If there are no failures there, then that is further evidence that #114 is causing problems. The CI runs don't necessarily back up the slowness claim.
At least the overall CI times are relatively similar. @alexfikl, could you provide a way to reproduce your Stokes run that got substantially slower? The missing barrier is something I encountered in work with @rckirby. I still have to work out what's happening, I will report back here. |
The first run of inducer/pytential#175 passed. I've just started another run as an additional data point. |
Second run passed as well. |
Just to add more information here from my side. First, for the slowdown, I noticed that on the 3D Stokes operator from inducer/pytential#29 (slightly modified and merged with Then, for the warnings on
with different Lines 560 to 565 in 4ace8ea
but the dependencies seem correctly declared at least. |
For a reproducer of my slowness claim, this should work EDIT: Just started running in pre-114 and it seems to be equally slow. I'll post the numbers once it finishes 😢 EDIT2: I take that back, something definitely seems off. Pre-114 it was giving
so about 30min to compile and evaluate the Stokeslet. But now it's over an hour. I'm running this on EDIT3: Ok, something is very fishy, just ran that script on
which is somewhere around 6h! Let me know if you get a change to take a look or trying it out, maybe there's something silly in there :\ |
Warnings about write races are false positives. See inducer/loopy#564 |
macos failure is a segfault in pocl. |
Are you able to get a backtrace? Is it in the pocl runtime or in a kernel? |
|
So does pocl miscompile vkfft? |
Is there an easy way to reproduce this? Will it reproduce on the CEESD M1? Should we start thinking about reverting #114 for now? |
Or maybe turn off vkfft on Mac? |
I can reproduce on appletini. |
Running the test with |
|
There's a warning just before the segfault due to a bad access, but don't think they are related.
|
Using an unoptimized loopy kernel instead of vkfft makes the test pass. See #130 |
Thanks! Does this reproduce when calling pyvkfft on its own? I am asking with an eye towards potentially reporting this upstream to pocl |
Yes, running the example https://github.com/vincefn/pyvkfft/blob/master/examples/opencl-test.py segfaults. |
|
@hirish99 reported another VkFFT issue occurring on a Mac: https://gist.github.com/hirish99/16d2888092595283b0f698bf5d8106c0 I do not know whether this is Apple silicon or not. |
When I try to run helmholtz-dirichlet.py on my Mac (I don't know how much of this matters) running MacOS Catalina Version 10.15.7, Processor: 2.6 GHz 6-Core Intel Core i7, Memory: 16 GB 2400 MHz DDR4, Graphics: Radeon Pro 560X 4 GB |
Quite possibly this is another possible symptom of the presumed miscompilation described in pocl/pocl#1084. |
@hirish99, can you try running https://gist.github.com/isuruf/17f6b210cf4cf8c8b103c18e155e00d6? |
Thanks. What do you get when you run |
I have not modified sumpy yet btw, I updated the gist to show the output of ./a.out; echo. $ |
Thanks. Can you try the updated program at https://gist.github.com/isuruf/17f6b210cf4cf8c8b103c18e155e00d6? |
* use loopy fft * fix inverse * Implement broadcasting FFT * use enum for fft backend * Add gh-129 link * unit test for loopy_fft * Unit test for loopy_fft and fix warnings * don't use vkfft only if x86 mac * Add missing import * Fix platform.machine()
Pytential picked up a failure as well on Gitlab CI: https://gitlab.tiker.net/inducer/sumpy/-/jobs/443459 Edit: That seems to be intermittent. https://gitlab.tiker.net/inducer/sumpy/-/pipelines/324267 |
Can confirm that that works around it nicely for that Stokeslet case! Just ran #132 compared to fd355eb (before pyvkfft) with the script from #129 (comment) and got
So that seems to be even faster than before. |
Yep, ran the same benchmark script and got
|
That leaves the intermittent VkFFT errors. I'm still a bit lost there. @isuruf mentioned he also can't seem to reproduce them. |
macOS errors are gone because we don't use VkFFT for macOS. As to the VkFFT OpenCL compile failure in https://gitlab.tiker.net/inducer/sumpy/-/jobs/443459, I've tried many times and failed to reproduce. Only explanation I can think of is that the machine was particularly busy and the OS killed the pocl process compiling the OpenCL kernel. |
Let's see how often they recur. I feel like I've seen them more frequently than could be explained by heavy machine load. |
That pipeline was running on |
Issues showed up after #114:
cc @isuruf
The text was updated successfully, but these errors were encountered: