feat: EMLX compiled mode #68

polvalente · 2024-12-12T08:54:16Z

This PR uses Cocoa's nif_call library to bridge the missing
gap and implement a proper Nx.Defn compiler for EMLX.

closes #61

Benchmark of backend [EMLX, EXLA] against compiler [self, Nx.Defn.Evaluator]

Mix.install [{:emlx, path: __DIR__}, :benchee, :exla]

defmodule ToBench do
  def run(backend, compiler) do
    prev_backend = Nx.default_backend(backend)
    fun = fn x, y ->
      Enum.reduce(1..10, x, fn x, acc ->
        Nx.dot(x, acc)
        |> Nx.add(y)
      end)
    end


    Nx.Defn.jit_apply(fun, [Nx.iota({10, 10}, type: :f32, backend: backend), Nx.iota({10, 10}, type: :f32, backend: backend)], compiler: compiler)
    |> tap(fn _ -> Nx.default_backend(prev_backend) end)
  end
end

Benchee.run(%{
  "EMLX (evaluator)" => fn -> ToBench.run(EMLX.Backend, Nx.Defn.Evaluator) end,
  "EMLX" => fn -> ToBench.run(EMLX.Backend, EMLX) end,
  "EMLX gpu" => fn -> ToBench.run({EMLX.Backend, device: :gpu}, EMLX) end,
  "EXLA (evaluator)" => fn -> ToBench.run(EXLA.Backend, Nx.Defn.Evaluator) end,
  "EXLA" => fn -> ToBench.run(EXLA.Backend, EXLA) end
})

Name                       ips        average  deviation         median         99th %
EXLA                   37.97 K       26.34 μs    ±49.15%          22 μs       94.71 μs
EMLX                   16.64 K       60.08 μs    ±43.45%       52.88 μs      204.13 μs
EMLX gpu               16.11 K       62.07 μs    ±61.75%       53.29 μs      206.79 μs
EMLX (evaluator)        3.48 K      287.49 μs    ±30.55%      280.96 μs      421.05 μs
EXLA (evaluator)        3.17 K      315.41 μs    ±28.33%      296.25 μs      540.18 μs

Comparison: 
EXLA                   37.97 K
EMLX                   16.64 K - 2.28x slower +33.75 μs
EMLX gpu               16.11 K - 2.36x slower +35.73 μs
EMLX (evaluator)        3.48 K - 10.92x slower +261.16 μs
EXLA (evaluator)        3.17 K - 11.98x slower +289.08 μs

polvalente · 2024-12-12T09:20:45Z

c_src/emlx_nif.cpp

+    return output_tensors;
+  };
+
+  emlx::function compiled_function_ptr = mlx::core::compile(fun);


@awni I got things working via the idea we discussed yesterday! ~5x speed up vs the non-compiled version as per the benchmark in the PR description.

I'm do wonder about what XLA is doing to get such a performance boost when compiled.

Cool, so you figured out how to pass an elixir function into C/C++?

It's not exactly that, but that library I mentioned that's newer.

We're calling back into Elixir (in an asynchronous manner, mind you) and then executing the Nx AST with the tracer params provided by mlx::core::compile.

Not the ideal solution as this forces copying data between processes, but given that the majority of the data is pointers or references to things, works well enough.

polvalente added 8 commits December 11, 2024 21:23

feat: add nif_call-based function compilation

897bcd4

fix: make tests pass

cc20996

wip

e911ada

feat: working callback

7304619

wip: add benchmarks

05bdbd0

feat: use non-modified header

ff1f625

chore: remove bench files

5de25dd

test: add compile test

4742a24

polvalente self-assigned this Dec 12, 2024

polvalente added 3 commits December 12, 2024 06:01

chore: forat

39f5256

docs: add warning on EMLX compiler

773db89

docs: show github installation

07d4811

polvalente commented Dec 12, 2024

View reviewed changes

polvalente requested a review from cocoa-xu December 12, 2024 11:54

polvalente added 2 commits December 20, 2024 15:00

chore: update to safe nif_call version

50f3717

wip: update nif_call

37fb1d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: EMLX compiled mode #68

feat: EMLX compiled mode #68

polvalente commented Dec 12, 2024 •

edited

Loading

polvalente Dec 12, 2024

awni Dec 12, 2024

polvalente Dec 12, 2024

feat: EMLX compiled mode #68

Are you sure you want to change the base?

feat: EMLX compiled mode #68

Conversation

polvalente commented Dec 12, 2024 • edited Loading

polvalente Dec 12, 2024

Choose a reason for hiding this comment

awni Dec 12, 2024

Choose a reason for hiding this comment

polvalente Dec 12, 2024

Choose a reason for hiding this comment

polvalente commented Dec 12, 2024 •

edited

Loading