You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This might result in an incorrect number because the actual code is running on the GPU and the CPU is just dispatching the kernel. Hence, I suspect that the time that you are getting is CPU execution time of dispatching the kernel.
Hey,
I think the tokens/s calculation might be incorrect. I can see that you are computing time by timing the CPU clock here: https://github.com/likejazz/llama3.cuda/blob/master/llama3.cu#L789
This might result in an incorrect number because the actual code is running on the GPU and the CPU is just dispatching the kernel. Hence, I suspect that the time that you are getting is CPU execution time of dispatching the kernel.
The correct way will be using cuda events.
Reference: https://developer.nvidia.com/blog/how-implement-performance-metrics-cuda-cc/
The text was updated successfully, but these errors were encountered: