Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganization #748

Open
wants to merge 11 commits into
base: reorganization
Choose a base branch
from

Conversation

kpwelsh
Copy link
Collaborator

@kpwelsh kpwelsh commented Jan 20, 2025

Reorganization Plan

1 - File Movement

The reorganization consists primarily of creating sub-folders under hexrd and moving most files into one of them. The goal of the first PR will be to move the files into this new location while making the minimum changes necessary to maintain functionality. The reorganization plan suggests that some files / classes be split up across workflows. During this phase, we will instead duplicate the files across workflows.

2 - Reconciliation

During this phase, we will consider in smaller chunks code changes necessary to split up the classes and files across workflows. This should be limited to approximately the following code areas:

  • hexrd/materials/
    • crystallography
    • material
    • unitcell
  • hexrd/instrument/
    • hedm_instrument
    • physics_package
  • hexrd/xrdutil
  • hexrd/config
    • instrument

3 - Migration Tooling

During this phase, we will consider what the best options are for making code migration for users as painless as possible.


Phase 1 File Migration

All of the mapping was done according to the workflow labeling spreadsheet. A complete list of the file mappings can be found in file_map.pkl . It is being kept temporarily to support migration workflows.

File Mapping

The files were mapped automatically using a new package_remapper tool. This works in a couple of stages to correctly move the files while keeping imports coherent. This tool was used to accomplish the bulk of the work and help ensure that no imports are missed, but all code changes made by the tool were reviewed and verified by hand.

File Mappings

The package_remapper requires a 1-to-many map of files from one repository to the other. It will copy the files in the source to everywhere listed in the destination . The file mapping used was generated from the spreadsheet by rolling up the workflows to the file level. Things like hexrd/valunits.py clearly map to one workflow, core . However, things like hexrd/config/instruments.py need to be split into multiple workflow sub-folders (core and hedm in this case. During the file mapping stage, we will just copy the whole file in both locations and log it for a human (me) to disentangle later. It is far too complicated to safely do this disentanglement automatically.

Example:

hexrd/config/instrument.py -> hexrd/core/config/instrument.py
                           -> hexrd/hedm/config/instrument.py

It is not necessary that all files in the source directory have an explicit destination. If files are omitted for some reason, they will be brought over with the containing module. If there is no containing module (e.g. setup.py) they will be copied over in the same location relative to the package source. When generating the list of files, the relevant .gitignore is used to filter out files we don’t care about if any are present.

Quick Git Commit

Once the files are moved and before they are patched, the move is committed so that the file history is maintained and tracked through the move. The move and the changes necessary to make the code work after the move will be attributed to me, but no existing history is affected.

As far as I can tell, when copying files to multiple destinations, one is ~arbitrarily picked as the “rename” and others are picked as new files. The rename gets the source file history. In a later stage we will determine which files should get the history, and I will fix the git history then.

Import Patching

Now that all of the files are in potentially different locations, the referencing import statements are fixed*. We do this by parsing the python files and looking for 3 things:

  1. ast.Import statements: import X.Y as Z
  2. ast.ImportFrom statements:
    1. absolute imports from hexrd.config import Config as ConfigAlias
    2. or relative from ..config import Config as ConfigAlias

*wppf/RietveldHEDM.py does not parse and will not be patched.

During the patching, we will keep any aliases that are listed, and if the ImportFrom statement is relative in the source, we will keep it relative after the transformation. Again we run into the 1-to-many file mapping issue. Here we take the same approach as the file copying, we will transform 1 import statement into multiple, one for each new location, and log the file and imports for a human (me) to resolve later.

Note: There are also ast.Name statements that need patching. Because of the low-frequency of this style of module reference (i.e. import hexrd followed by later hexrd.module.something) its just easier to do these by hand.

Note: this import mapping tool will be used later to help users automatically update their imports.

Fixing things by hand

There are several things that either were too hard to patch automatically or just couldn’t be patched automatically that needed addressing.

setup.py + Extensions

There are 4 extensions in that are built in the setup for HEXRD: Convolution, Old XF, New XF and CPP Transforms. These were changed to take their source code from their new respective folders and install the module in their new respective locations:

  • hexrd/convolution → hexrd/core/convolution
  • hexrd/transforms → hexrd/core/transforms

There is also a console script that needs to be updated. This was updated in the setup.py and in the conda.recipe/meta.yaml.

  • hexrd.cli.main:main → hexrd.hedm.cli.main:main

ast.Name

The following references were found to hexrd modules that are not direct imports and were patched as follows —

hexrd/core/
	material
		unitcell.py
			- hexrd.resources -> hexrd.core.resources
		utils.py
			- 2x hexrd.resources -> hexrd.core.resources
hexrd/hedm/
	material
		unitcell.py
			- hexrd.resources -> hexrd.core.resources
hexrd/powder/
	wppf
		wppfsupport.py
			- hexrd.instrument -> hexrd.core.instrument
tests
	config
		test_instrument.py
			- hexrd.instrument -> hexrd.core.instrument
	

Selecting duplicate imports

Duplicating the files across workflows will result in duplicate imports on dependent files. This is resolved by removing the duplicates and selecting the most appropriate file reference. As an example:

hexrd/hedm/somefile.py

# The "module" module was duplicated across both hedm and core
# Which results in dependent files importing both
from hexrd.core.module import MyClass
from hexrd.hedm.module import MyClass

# In the HEDM workflow folder, we will resolve this by selecting the hedm import
# -- from hexrd.core.module import MyClass --
from hexrd.hedm.module import MyClass

# If we are in a non-HEDM workflow folder, 
# we will resolve this by selecting the core import
from hexrd.core.module import MyClass
# -- from hexrd.hedm.module import MyClass --

During this process, it was found that several locations had non-trivially resolvable inter-workflow dependencies. A couple of core files reference HEDM, several things reference WPPF, and some others. These were all marked with TODO comments to address in the next phase once the duplicate file reconciliation is completed. If there are still issues that cannot be clearly resolved, I will solicit additional feedback from SMEs to make the decisions.

Old Module Aliasing

This falls squarely under “migration tooling”, but it is included in this initial PR because the test files contain pickled objects that reference the old structure. So to be able to load the data in https://github.com/HEXRD/examples we need to be able to use the old style references. These test files will be updated when we deprecate the old structure and switch to the new one.

Because of the limited remapping we are doing (just inserting a file layer between hexrd and the existing code), we only have to consider 2 cases.

  1. Callers try to import something directly from deeper in the library: import hexrd.config.utils
  2. Callers import hexrd and then reference things in the module later: import hexrd; hexrd.config.utils

Case 1 With Import Hooks

The python import system is quite extensible, so we can actually install special hooks to resolve old imports to the correct new imports. Note that users will not get type support or auto-complete when referencing modules imported this way.

To create our import hook, we will make use of sys.meta_path. As described in the documentation, this allows us to add a Finder that is essentially a map from fully-qualified module names to file locations. We will use it in conjunction with the file_map.pkl created during the remapping to automatically re-direct import references.

The core of our hook will be the following:

    def find_spec(self, fullname, path, target=None):
        if fullname in module_map:
            mapped_module, mapped_fp = module_map[fullname]
            sys.modules[fullname] = importlib.import_module(mapped_module)

This will simply take the full path name that the user was trying to import and figure out what we should actually import.

Fixing module references

For users that import hexrd and then reference its attributes:

import hexrd
...
# Reference some submodule here.
hexrd.xrdutil

we need to be able to intercept these accesses and redirect them to the correct object. This is nicely done with a module level __getattr__ on hexrd. When users access hexrd through hexrd.modulethey will get pointed to the location of the new module.

We are only putting a getattr like this at the top level because the structure changes only occurred just below the top level. If users get to a module inside of one of the workflows, the references are the same. Additionally, the import hooks will handle any direct references lower than hexrd.

Additional Fixes / Changes

In keeping with the spirit of this PR, the additional changes are kept to a minimum, however, I think there are a couple of things that are non-invasive and useful.

  1. There is a reference to "xrdutils.utils", which is not a real module. This will be changed to "hexrd.core.xrdutil.utils"
  2. The are no test dependencies listed for HEXRD. So I added some in the pyproject.yaml. You can install the test dependencies with: pip install hexrd[test]. Making a new virtual environment and install the optional test dependencies allows all tests to run.

Review Requests

There are a number of things that I think should review should focus on:

  • Did I miss any files? Did I delete something that I shouldn't have?
  • Did I miss a reference to a hexrd module in a non-standard format?
  • Is there something else we need to change in this PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this is currently listed as deleted, since it was correctly moved to hexrd/core/fitting/calibration.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was deleted by the package mapper.
But it doesn't appear to be necessary, given the ignores already in the root:


# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before. This is correctly copied to hexrd/core/instrument

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was correctly moved to hexrd/core/material

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was correctly moved to hexrd/hedm/sampleOrienations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was correctly moved to hexrd/powder/wppf

@kpwelsh kpwelsh marked this pull request as ready for review January 21, 2025 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant