Skip to content

Latest commit

 

History

History
93 lines (54 loc) · 7.84 KB

25-dynamic-dependencies.md

File metadata and controls

93 lines (54 loc) · 7.84 KB

Dynamic Dependencies

Nix and Bazel don't allow dynamic dependencies. I think there is an argument to be made that this is the reason their ergonomics are so tricky. Nix libs that are intended to build arbitrary projects in a given language rely heavily on code generation. Arguably this is a type of dynamic dependencies.

I think it would be interesting to explore first class support for dynamic dependencies in Bramble. Maybe if they are easy to use and set up we can limit the amount of derivations that need network access. If you can generate arbitrary calls to fetch_url within a derivation, then maybe you can get away with just that.

One Idea:

There is a specific type of derivation that outputs starlark code. It is a different color than regular derivations (so we can detect it statically), and only outputs starlark code. This starlark code is run once the derivation is done building. We would need to update the dependency graph as we build.

This has some weird implications because we would still need to reference the build output. Do we just need to ensure that the generated code just outputs a single derivation?

So let's think about numpy.

def foo():
    pip_install("numpy")

There is no real way to do this because numpy will need to download its own dependencies. So we could either:

  1. Download them within the derivations using the network, but then other depedencies might generate their own independent depedencies, which would be duplicated and might conflict.
  2. Generate code for each dependency, which totally works, but dosn't have first class support.
def foo():
    pip_install("numpy")

def pip_install(name):
    deps = fetch_url("dependency_finder.gov/"+name)
    derivation(script="""
    out = ""
    for dep in deps:
        out += "fetch_url(dep)\n"
    return out
    """)

Terrible pseudo-code, but basically this derivation returns starlark code with the deps we need to download.

If we go this route, we would need to be able to check if we've generated this derivation on the fly. I think if we don't do that, it would be very hard to do things like: validating current url hashes, without actually building.

We could just stick the outputted starlark into the store somewhere, but it might be better to generate code and keep it in the source of the project. If the interface is just like in the example above fetch_url("numpy") then what if a new version of numpy is published? Any time there was a rebuild the url hash would mismatch. I think ideally if we want to replicate something like a Cargo.lock we would need the output of the code generating derivation to be placed within the project tree. That way, that generated file could reference very specific versions of software to fetch. If the end user wanted to fetch a new version they would simply delete the generated file.

This doesn't really remove the need for derivations that access the network. The code generating derivation would still need to make a request for the numpy source in order to calculate dependencies.

This kind of thing would mean that you could truly write a derivation like pip_install("numpy") without code generation that would require certain setup.

We could also prevent code generated by a code generating derivation from calling another code generating derivation, at least at the start, to limit all kinds gnarly behavior.


More thoughts.

This might actually be a good idea. We could then move forward with limiting network functionality only to derivations that use the network. That way we could be sure that after we've processed all derivations initially we can proceed from there without ever using the network.

One complication here is that bramble libraries could use this as well, so we can't generate bramble code and then store it next to the initial source file. Generate code could also depend on passed parmeters, so we wouldn't be able to check a file's lock file just by analyzing the source of that file, we'd need to be sure we were testing it with whatever parmeters were passed to functions in that file.

wait, maybe not, the generating function will always be called without arguments. so maybe we just put the code in the project next to the file that calls the function.

Ok, either way, that needs to be sorted out, and we might want to consider just adding generated code to the lockfile.


When a derivation that outputs other derivations is run, what does it output? A map of derivations that can be assembled into a tree? Yeah, probably an execModule ouput. So do we want people generating those from bramble code? They could also just output bramble code? Maybe we support both.

What if the bramble code that is run has dependencies? These are runtime dependencies. Would be good to track those so that we can more reliably rebuild this part, maybe they are regular dependencies maybe they aren't. Would be easy to just run everything when adding dependencies and then run derivations that output derivations to get the full list of deps. Well, easy but possibly problematic.

Hmmm

Could require deps to be loaded before the derivation runs for now. And then yes, seems like we should support outputing bramble files. Oof, what if the derivation wants to use other relative files? Maybe we go back to code generation?

Hmm, if we explore code generation (since we're already at code generation), how would that work? Generated files are marked as read only. Code is super auditable. Could generate steps be part of builds? Seems hard.

Well, if it's explicit then we can make it work. If a generate() is a thing, and that can be run within the project generating some file, and then that can be used in a later build that's not terrible. How would it be abused? Basically you need to know what will be generated and what the sources are. Then we can just require that no generate commands are run on the server. Generate commands could require network access. This also helps with the "how tf do we generate things" problem. Generation could be first class.....

Hmmm

So how does this work from a graph patching standpoint? Let's say we have a go-module, we'll need to generate a bramble file from the go-module. So we mark the generate as needing go.mod as input and having go-mod-gen.bramble as output. Pass the env var "bramble2bramble". Then, we need the output file to be parsed and for that output to be parsed and used as graph input. Oh this is good, so this is great, because if there are deps in the file output we'll just get a dependency error.

(Maybe we just want to pick versions by putting them in the load statement. Could mean generated code could pick versions. Also means that we're just go again, but this time with @3 and not /3 (and I'm sure I'll figure out why @ is problematic)) (later edit: ah yes, the @ is problematic because I'm required to put it at the level of the path that the package is at, which is hard, do I put the @ and then have a path after it? no. so the path is now ambiguous, ugh)

Ok, so we have a generate step. Generate commands can use the network if they output just a single bramble file and set a special env var. Once build is complete file will be parsed and graph will be patched. Function to run must be included in the build input.

So now we:

  1. Add explicit generate pattern
  2. Add special generate pattern that patches the graph

Ok, so regular build uses the network and re-generates the file if the sources have changed. Is there a way to track this? Maybe this goes in the lockfile? Yeah whatf do we do with the generate step? It's a derivation? It's a derivation that points at a generated file? How does that reference work? It's in the special value. So when a file changes, we re-generate. How do we know a file has changed? With generated file, we skip it, we just trust the generated file???? Yeah this is nasty, write it out...

Ah yes, ok, the problem is, how do we know that the input files have changed?