-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: EMLX compiled mode #68
base: main
Are you sure you want to change the base?
Conversation
c_src/emlx_nif.cpp
Outdated
return output_tensors; | ||
}; | ||
|
||
emlx::function compiled_function_ptr = mlx::core::compile(fun); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awni I got things working via the idea we discussed yesterday! ~5x speed up vs the non-compiled version as per the benchmark in the PR description.
I'm do wonder about what XLA is doing to get such a performance boost when compiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, so you figured out how to pass an elixir function into C/C++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly that, but that library I mentioned that's newer.
We're calling back into Elixir (in an asynchronous manner, mind you) and then executing the Nx AST with the tracer params provided by mlx::core::compile.
Not the ideal solution as this forces copying data between processes, but given that the majority of the data is pointers or references to things, works well enough.
This PR uses Cocoa's
nif_call
library to bridge the missinggap and implement a proper Nx.Defn compiler for EMLX.
closes #61
Benchmark of backend [EMLX, EXLA] against compiler [self, Nx.Defn.Evaluator]