Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix queries for gbasf2 grid files with multiple extensions #181

Merged
merged 7 commits into from
Jan 13, 2023

Conversation

meliache
Copy link
Collaborator

@meliache meliache commented Jan 12, 2023

Hotfix that hopefully resolves issue #180, if I understood the behavior of gbasf2 correctly. Currently I don't have time and no access to the grid. @schmitca, could you please check if this works?

@schmitca For installing this branch, you can use

pip3 install --upgrade "git+https://github.com/nils-braun/b2luigi#bugfix/gbasf2-correctly-handle-multi-extension-outputs" 

Resolves #180

@meliache meliache added bug Something isn't working gbasf2 Concerns the gbasf2/grid b2luigi wrapper needs changelog Entry to CHANGELOG.md is missing labels Jan 12, 2023
@github-actions github-actions bot removed the needs changelog Entry to CHANGELOG.md is missing label Jan 12, 2023
Also use latest pre-commit minor version
@codecov-commenter
Copy link

Codecov Report

Base: 58.15% // Head: 59.16% // Increases project coverage by +1.01% 🎉

Coverage data is based on head (82411af) compared to base (910cf36).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #181      +/-   ##
==========================================
+ Coverage   58.15%   59.16%   +1.01%     
==========================================
  Files          22       23       +1     
  Lines        1503     1555      +52     
==========================================
+ Hits          874      920      +46     
- Misses        629      635       +6     
Impacted Files Coverage Δ
b2luigi/batch/processes/gbasf2.py 46.13% <0.00%> (ø)
b2luigi/__init__.py 88.46% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@schmitca
Copy link
Contributor

schmitca commented Jan 13, 2023

Dear @meliache, I am happy to test it. At the moment, the grid jobs with udst.root output fail, before download can be attempted. This is because running the pickled basf2 path with udst.add_skimmed_udst_output(skimDecayMode='skimname') fails with

[FATAL] identical from/to parameter value   { module: SwitchDataStore ('' -> 'skim_skimname') function: virtual void Belle2::SwitchDataStoreModule::initialize() }
[INFO] ===Error Summary================================================================
[FATAL] identical from/to parameter value   { module: SwitchDataStore ('' -> 'skim_skimname') }

This can be traced back to the fact that
basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath
I will try to find/contact an expert for basf2.pickle_path.

@meliache
Copy link
Collaborator Author

meliache commented Jan 13, 2023

Dear @meliache, I am happy to test it. At the moment, the grid jobs with udst.root output fail, before download can be attempted. This is because running the pickled basf2 path with udst.add_skimmed_udst_output(skimDecayMode='skimname') fails

This seems to be a separate issue and I'm afraid if we need to add a workaround for that, that would be quite involved and probably should be a separate pull request, but not sure if I have time for that.

This can be traced back to the fact that basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath I will try to find/contact an expert for basf2.pickle_path.

Honestly, good luck with that, I don't expect that to be simple. I think (though don't quote me on that) that Thomas Keck had contributed to the pickle path functions because he needed it for the FEI. Martin Ritter also new about them. But those people left the collaboration alreayd long ago. With those things you are usually more succesful by trying to dig into the code yourself and maybe even contribute.

I just checked the definition of udst.add_skimmed_udst_output and it seems that it creates an additional conditional path

    skim_path = basf2.create_path()
    [...]
    filter_path.add_independent_path(skim_path, "skim_" + skimDecayMode)
    add_udst_output(
        skim_path,
        outputFile,
        saveParticleLists,
        additionalBranches,
        dataDescription=dataDescription,
        mc=mc,
    )

I don't understand conditional paths super well, but it's well possible that serialize_path has problems with that.

basf2.pickle_path.deserialize_path( basf2.pickle_path.serialize_path(mypath)) != mypath
You could check really what's the difference by eye or looking at the diff of basf2.print_path. Sadly I don't think I have the time to dig into thhat.

For testing this pull request I would try just using udst.add_udst_output, or also just add dots to a root file name, like foo.bar.root. And try figure out the conditional paths seperately

Update: I now created #182 for the pickling issue.

@schmitca
Copy link
Contributor

schmitca commented Jan 13, 2023

I can confirm that the download is working with udst.add_udst_output and

--- a/b2luigi/batch/processes/gbasf2.py
+++ b/b2luigi/batch/processes/gbasf2.py
-        dataset_query_string = f"{output_lpn_dir}/{self.gbasf2_project_name}/sub*/{output_file_stem}_*{output_file_extensions}"
+        dataset_query_string = '.'.join([f'{output_lpn_dir}/{self.gbasf2_project_name}/sub*/{output_file_stem}_*', output_file_extensions])

It seems that I cannot push my changes, but this can be commited and closed.

@meliache
Copy link
Collaborator Author

meliache commented Jan 13, 2023

I can confirm that the download is working with udst.add_udst_output and

Nice 👍

It seems that I cannot push my changes, but this can be commited and closed.

Thanks for the patch. I added a commit with your line and added you (name and email) as the author of the commit. Git allows after all having separate authors and commiters for commits (which is used e.g. in the linux kernel where patch emails are still used). But I didn't push the commit yet because I'd like to get your permission to use your full name and email, which I found on b2mms. You might want to disagree for privacy purposes, then I'd change the commit author to your github nickname. Also, if you don't use your LMU email on github, it would be better adding your github email, so that github recognizes you as the author of the commit, even when I pushed it.

Btw, github has a feature where when the reviewer comments on a line of code, they can add a code suggestion and I could then accept this suggestion. But I don't like edits via github much, because editing in a proper editor/IDE results in better style you get syntax/style checking if you use a modern editor/IDE, which is not the case in github UI edits.

@schmitca
Copy link
Contributor

Thank you @meliache, I added a suggestion.

@meliache
Copy link
Collaborator Author

@schmitca thanks for doing that so late. Can I use your full name from B2MMS in the contriburs list in the documentation?

@schmitca
Copy link
Contributor

Sure, thank you! @meliache

If a variable is returned immediately, then there's no point creating the
temporary variable, just return the value directly.

Plus, splitting over multiple lines makes sure we stay within the character limit.
@meliache meliache merged commit 6844963 into main Jan 13, 2023
@meliache meliache deleted the bugfix/gbasf2-correctly-handle-multi-extension-outputs branch January 13, 2023 18:36
@meliache
Copy link
Collaborator Author

meliache commented Jan 13, 2023

@schmitca I now published this to PyPi as release 0.8.2, so if you install b2luigi with pip install b2luigi --upgrade, you should have the functionality available immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working gbasf2 Concerns the gbasf2/grid b2luigi wrapper
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hardcoded extension ".root" in _get_gbasf2_dataset_query in b2luigi/batch/processes/gbasf2.py
3 participants