-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment package imports with git integration #1543
Comments
An addendum: using |
We use a flat hierarchy and update the python path per file. Something like: import sys
from pathlib import Path
cwd = str(Path(__file__).parent)
if cwd not in sys.path:
sys.path.insert(0, cwd)
|
I have also found this frustrating. Some points I've come across:
|
@ljstephenson You could create a blank
However, this only works for modules that import something in their own directory, or in a subdirectory. Imports across two subdirectories do not work. To allow actual Python packages within experiments, what we could do is set |
|
And this is just another case of the "subdirectory with |
Thank you, "Open experiment outside repository" is what we're currently using. I think the problem becomes a bit more complex for multiple remote users and when using experiments importing code from other files. I'm not aware of a solution that doesn't make the commit-hash misleading if another user is editing a file that is imported by an experiment I run. Admittably, this is an edge case, though one that we encounter fairly frequently. (Especially if uncommited changes are left in these files) |
Do you have a commit hash associated with experiments submitted via "Open experiment outside repository"? Or are uncommitted changes making it through BTW on Linux we could also have Nix integration, with each experiment run in a |
Not being able to import across subdirectories (basically enforcing a flat structure) probably isn't workable with ~100 experiments. I know that searching by name for experiments is possible, but having some logical grouping is still nice. The second option could work, but part of the problem is that there is some fluidity between what is an experiment and what is a library function -- for example, experiment A might do a simple measurement scanning one parameter, and experiment B runs experiment A at each value of a second parameter scan. Does experiment A belong in "library" or "experiment"? (Perhaps not the best example, since logically they would be experiments in the same subdirectory, but you get the point.)
Uncommitted changes in Is there anything stopping us just setting
I guess my use case/need is any file able to import from any other within
This is the sledgehammer but could definitely work for people on Linux - would there be a run time penalty? Personally I'm slightly less worried about capturing absolutely everything; IMO every experiment must save any raw data collected, then any analysis results can be recalculated if/when the analysis technique changes -- i.e. saving the analysis environment would be a "nice to know" for archaeology purposes but not vital. So long as I know what the pulse sequence was I'm happy. Apologies for presenting a problem without any good idea what solution it has... |
Correct.
When everything is cached in the Nix store, |
For my use case I'd be very happy with the |
We can also just set the current working directory to the repository - fewer moving parts. |
Fine by me! |
There's the issue of experiments submitted outside the repository. And ensuring that an experiment file works the same regardless if it's submitted via the repository or not is tricky. |
Is there something wrong with |
Seems this issue isn't fixable without a
You'd have:
and experiments in
Nix integration with each experiment run in a |
How about this: add some code to from pathlib import Path
import sys
from importlib.util import spec_from_file_location, module_from_spec
# This would be provided externally and would be an absolute path to the
# __init__.py file in the experiment repository root dir, or None if not present
path_to_experiment_init = Path(__file__, '../../artiq_repo/__init__.py').resolve()
if path_to_experiment_init is not None:
# Load the __init__ file if present
spec = spec_from_file_location("artiq_root_dir", path_to_experiment_init)
module = module_from_spec(spec)
spec.loader.exec_module(module)
# If the variable "__artiq_repo_name__" has been defined, save this module
# (which is actually a package) into the cache
if hasattr(module, '__artiq_repo_name__'):
sys.modules[module.__artiq_repo_name__] = module This would have the effect of
User code could then define By having this as an opt-in via the variable, it won't take anyone by surprise. We could later turn it on by default (choosing some sensible default name like "repository") if it works well. By using an |
This is complicated... |
That wouldn't address the ability of experiments in subdirectories to access libraries above their dirs |
It would, just specify an upper level directory to add to PYTHONPATH. |
True. Another few benefits of having the experiment repo be usable as a package though:
Rather than having e.g.
you'd have
which makes things explicit and allows automatic code linters like pre-commit to sort imports properly.
So with a structure like
Any package metadata stored in
I think implementing this would actually be dead easy, easier than adding config options to the GUI. Simplfying my above suggestion such that you always use the name "repo" instead of allowing the user to choose, you'd just alter exp = get_experiment(experiment_file, expid["class_name"]) in if repository_path and (repository_path / "__init__.py").exists():
sys.modules["repo"] = tools.file_import(repository_path / "__init__.py", prefix="artiq_worker_")
exp = get_experiment(experiment_file, expid["class_name"]) |
Also doable with my solution, just create a
AFAIK this would also be working with my solution.
But it's complicated and ad-hoc. Adding a few widgets to the GUI is straightforward. |
True also - if you maintain your code in a Also, I thought from the above...
that you weren't a fan of changing the path? Personally, I'd prefer to keep as much configuration in the code as possible, where it's nicely source-controlled (or, better, not to require any configuration). Adding more GUI options means you need to a) know what to set them to before code can be used and b) archive their settings to get full reproducibility of experiments. Also, is it so complicated? ARTIQ workers already work by manually importing the module containing the Experiment which is being run: this change would just be to also import the repository package at the same time. |
If it can be avoided and something better done instead...
How is "the directory" specified? Also, what about
The "user library path" would become part of expid and archived the same. |
For runs managed by
For
Sure that'd work.
Why not? |
Including for experiments submitted "outside the repository"? |
Currently, the run would just fail because nothing other than the file itself is available to the worker (AFAIK). And you can't really do this accidentally: you'd have to explicitly type I think there are two separate issues here:
My proposal for 1. is that we default to the repository passed via For 2., my proposal is to have this root directory available to import via The solutions to 1. and 2. are independent: having a GUI config for the root directory doesn't stop you from making it available as a package, and detecting an I also think we're imagining different use-cases here. I'm expecting that repository-based usage is the most common requirement, so should be the easiest to use. For external testing, I was expecting the submission of single files which currently cannot reference other files - after this PR, they'd have the option to access the existing repo if they wanted to. You seem to be imagining external submissions which pull in an entire directory of other files, so that changes to the entire repository can be tested via the "outside the repository" method - also a new feature AFAIK. That's a nice idea, and I think that it can be supported fine by the package approach. You use the PS we should probably reopen this issue since we're obviously still discussing it! |
Well, how do you test before committing? And, once we have a GUI setting for a directory, only modifying PYTHONPATH is the simplest option with the least consequences, and the most standard option wrt how the Python import system works. As I said it is even exposed in the Unix shell, and experiment files can still be imported easily without custom code that uses |
Fair enough. For "what to do with the path", I still think that the package-based approach is most explicit and that using For "how to find out the path", a GUI option for external submissions seems unavoidable but, for normal repository usage, I think the directory should be defined somehow by your code. It's an integral part of the program, in that your code won't work unless the path is set correctly, so its setting belongs in that code. By default, it should just be the repository path. It would be really nice if there was an option to set it instead to a sub-directory of the repository and to set that as the root for ARTIQ searches too. That way you could choose to structure your repo as a normal python package:
And the ARTIQ browser wouldn't prefix all your Experiments with |
PS the right-click menu for external submissions could be changed to be "Load experiment from external file" and "Load experiment from external repository". The former does the current behaviour, loading a single file with the path set as it currently is. The latter lets you select a folder, then a file. |
To give my 2 pence/cents/centavos/whatever - having everything under a As far as I'm concerned, the solution using PYTHONPATH seems like it would be usable; if I've understood correctly:
At this stage we can continue to argue about what the most elegant way of doing it is, or just pick one and do it. I think that the end result for users is the largely same whichever way it is done - files submitted outside the repository (GUI or otherwise) need to know about the python package, so we'd have to point to it somehow anyway. |
There's always the option of editing
Yes. |
Ok I got a bit late in this discussion (now referred through #1571 ), hope I did not miss any important part of your discussion. Let me know if I am completely missing a point somewhere. So first, our project structure with "shared code" is designed as shown in this example project https://gitlab.com/duke-artiq/dax-example . Shared code is located in the From my perspective, our "package structure" approach makes more sense than making a |
Even if you modify sys.path? |
Git repository discovery is a good idea, and we could use it as default for the path being added to PYTHONPATH. That would "just work" for many use cases I believe. |
I am not sure how
That sounds reasonable to me in case ARTIQ git integration is enabled. |
It does not, but it could fix your import problems. |
Ah I see. Well, my conclusion was actually that we do not have any import problems with our structure. The structure is just incompatible with the ARTIQ git integration feature, but that can be fixed easily with repository discovery. |
I don't know why you keep repeating that. If you edit sys.path in your experiment, you can make import works regardless of what the git integration does. Or is there something other than imports that would not work? |
Ok I guess I did not explain that clearly. So for the DAX example project, we start the ARTIQ master in the root directory of the Git repository where the device db is located (e.g. [nix-shell:~/dax-example]$ artiq_master -g
Traceback (most recent call last):
File "/nix/store/9vw2xqcqvcj4hsiiicwrq7gz41j3vp56-python3.8-artiq-6.7604.040aa6fd/bin/.artiq_master-wrapped", line 9, in <module>
sys.exit(main())
File "/nix/store/p180gf3r52l60kjk20nkhmhz237fvacg-python3-3.8.8-env/lib/python3.8/site-packages/artiq/frontend/artiq_master.py", line 104, in main
repo_backend = GitBackend(args.repository)
File "/nix/store/p180gf3r52l60kjk20nkhmhz237fvacg-python3-3.8.8-env/lib/python3.8/site-packages/artiq/master/experiments.py", line 195, in __init__
self.git = pygit2.Repository(root)
File "/nix/store/p180gf3r52l60kjk20nkhmhz237fvacg-python3-3.8.8-env/lib/python3.8/site-packages/pygit2/repository.py", line 1282, in __init__
path_backend = init_file_backend(path)
_pygit2.GitError: Repository not found at repository It is possible to set the experiment repository location using So that is what I meant by incompatibility with the ARTIQ git integration feature. Or do I misunderstand how to use the ARTIQ git integration feature? |
I think there are multiple points being discussed, some of which you address. I'll attempt to resummarise: The initial discussion raises the following issues:
The later discussion focuses on possible implementations that satisfy the above.
I haven't fully understood what you are doing. This sounds to me like another way of bypassing the git integration to allow imports? |
@pathfinder49 thanks for re-summarizing, I definitely did not capture all points.
In that case, I guess I did missed some points and some of my comments might have been out-of-context. My apologies in case I caused any confusion. I do not think I have anything to add to this discussion at this moment.
Yeah we bypass git integration to allow imports in the way we prefer. Our code basically has 3 levels of volatility: |
For complicated things like this, you may want the Nix environment to be managed by ARTIQ and included/referenced in expid. |
I've thrown together a rough draft PR #1805 to hopefully address this (although so much has been discussed on this issue that I'm sure I missed a few cases). I think it addresses @ljstephenson's points:
but more testing is definitely needed. |
Did anything changed about this topic? I am also interested in using the git integration with a repository that looks nearly identical to https://gitlab.com/duke-artiq/dax-example/. Unintentionally I tried the same thing that @lriesebos was saying should not work and that was running
On my machine and with my version of ARTIQ (6.7654.56b8c3c0), the master starts up without the mentioned error message and I am also able to connect to it via the dashboard allowing me to submit experiments. The master apparently does not scan only the ERROR:worker(scan,sqst.py):root:Terminating with exception (TypeError: __init__() missing 3 required keyword-only arguments: 'core', 'operation', and 'data_context') But nothing of this does affect the operation of the master as far as I can say. Only the experiments are then listed under the subfolder I tested this scheme with my own repo more in depth. My layout is:
Experiments within the Since I am not running a bare git repo here I added a #!/bin/sh
echo "POST-COMMIT: Scanning artiq repository..."
artiq_client scan-repository --async > /dev/null 2>&1 || echo "WARNING: artiq_master not running" that triggers when I do local changes in the repo. I guess one should also make it work for incoming changes from To get rid off the master skipping files, like the I don't understand the behavior of |
The solution that we seem to converge to is:
(First proposed here: #1805 (comment)) |
That seems like a good solution that also should work for us. What I don't understand is why |
I think that being able to run the master/dashboard with both a git backend (stable code) and a file backend (development/testing) makes a lot of sense, although I suspect it will require a decent bit of work since AFAICT the backend is immediately abstracted away upon creation of the experiment db.
Is that necessary? I realized it might be a somewhat redundant feature if these changes are made, but I don't think it hurts anything to keep it. And I think the latest version of #1805 handles out-of-repo experiments with a "repository path" decently enough. |
Would this not be potentially risky when files or subdirectories in the repository path have names that are the same as package names? We have already experienced such situations before with the existing population of |
Definitely. Not only is it redundant, but it is also broken as package imports would not work correctly when it it used. Adding a repository path in that situation adds complexity to the code (especially in the GUI) and does not improve UX compared to adding a file-backed repos.
Python already has plenty of such issues anyway and users need to be aware of them. Try running a Python program if you have e.g. an |
I see that you basically avoided the difficult parts and just put a "todo" comment in your code for #1805. Better just remove that redundant and messy feature entirely. |
Ah, yeah, I forgot about that part. Sounds good. |
You could just not do anything to PYTHONPATH, and add nothing to the GUI. That way you don't remove a feature that presumably some people do use. You can't use "Open experiment outside repository" to test library-dependent code, but you can't do that currently either, so nothing lost. And we're adding the multiple repository sources test option, to test library-dependent code. I don't have strong feelings about it though.
This should also support multiple branches / tags / commit ids from the same repository. That's probably the most common way you'd test a feature: by creating a feature branch, testing it out, then merging. I think going the PYTHONPATH route sounds like the most pragmatic way to get this feature live, so +1 from me! |
Question
Making the experiment repository a basic python package and trying to import from other files within it doesn't work in conjunction with git integration; I'm wondering what the canonical way of doing this is.
Category: setup
Description
To elucidate, consider the following example experiment repository:
qux.py
:bar.py
:(The reason for having such a structure is that most experiments are composed of repetitive blocks of standard operations, for example state preparation, and these should obviously be reused throughout the more complex composite experiments.)
If this is just on the filesystem, then running
artiq_master -r experiments
works perfectly,Bar
shows up in the dashboard explorer, and running it prints the expected message.However, if I follow the git integration steps (initialize a bare repository in another directory, say
experiments.git
, initialiseexperiments
as a repository, configureexperiments.git
as a remote and push to it),artiq_master -g -r experiments.git
fails to load the experiment: the import fails because the moduleexperiments
does not exist (relative imports also fail). I understand that this is expected due to the nature of python -- what I don't understand is how one is supposed to make use of the git integration, while also having a sensible codebase that doesn't have unnecessarily duplicated code.The obvious workaround is to maintain a library of the common functionality that is installed as a standard python package in the conda/nix environment; but then the "building block" type experiments are separate from the "composite" experiments, and composite experiments can easily be building blocks for yet more complicated experiments, so in fact everything may as well reside in this separate package (as we did in Oxford for the HOA2 experiment). At this point the git integration is useless, because all the functionality is no longer in the experiment repository itself, and so its git hash is irrelevant. One would have to look at the run time of the experiment, cross reference with commit times in the external package, and hope that it was in a clean unmodified state when the experiment was run - the exact problem that the git integration seeks to solve.
The text was updated successfully, but these errors were encountered: