-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best way to map existing files to Model instances #236
Comments
There are ways to make this work but it's not well documented. Internally, datafiles/datafiles/manager.py Lines 80 to 86 in 124ee31
So, if you include part of the path in Let me know if that works for you! I think the feature needs to be made more explicit and documented. |
Thanks for getting back to me! I have a really basic implementation working (using a simple # my_project/project.yaml
include_paths:
- '**.yaml' # list of discovered files will always exclude the statically-located project.yaml at the root of the project to avoid duplication
plugins:
extractors:
- name: project-tap-1
variant: meltano # my_project/team_one/subfile_1.yaml
plugins:
extractors:
- name: subfile-1-tap-1
variant: custom # my_project/subfile_2.yaml
plugins:
extractors:
- name: subfile-2-tap-1
variant: custom # all plain dataclasses
from .base import ConfigBase, ExtractorConfig, LoaderConfig, ScheduleConfig
@dataclass
class Plugins:
extractors: List[ExtractorConfig] = field(default_factory=list)
loaders: List[LoaderConfig] = field(default_factory=list)
@dataclass
class MeltanoFile:
plugins: Plugins = Plugins()
schedules: List[ScheduleConfig] = field(default_factory=list)
include_paths: List[str] = field(default_factory=list)
version: int = 1
@dataclass
class SubFile:
plugins: Plugins = Plugins()
schedules: List[ScheduleConfig] = field(default_factory=list) I wan't to be able to take over responsibility for discovering the 'root' If this is possible, we can then build a |
The way I am thinking about this is conceptually similar to how SQLAlchemy's Classical Mapper works. Object and persistence defined separately and then explicitly mapped 🙂 Ideally the schema and converters would be attached to a It looks like |
I'd be curious to see more sample code of what you tried and the result.
For that, you could possibly use create_model directly: from datafiles.model import create_model
parent_config = MeltanoFile(name='project')
for pathname in _iterate_globs(parent_config.include_paths):
model = create_model(SubFile, pattern=pathname)
child_config = model() # 'pattern' should only match a single file |
Glad we are on the same lines - I tried |
Since
To confirm that perhaps you could try pairing down |
Is there a way to map existing files with the same schema that do not match a repeatable pattern on disk to a datafiles Model instance manually? The use case is config files spread across arbitrary-depth subfolders below a top-level project directory. Using glob I can find the files I am interested in mapping, but I am not having much success creating mapped instances of those discovered files.
I have tried:
Model.Meta.dataclass_pattern
with each discovered files path and callingModel.objects.get()
pattern
defined and then overriding both theinstance.Meta.datafiles_pattern
andinstance.datafile.path
attributes on the instance, with the correct path for the discovered file, before callinginstance.datafile.load()
.However in both cases this results in an odd behaviour where all instances with nested attributes contain pointers to the most recently loaded files' nested object rather than their own 🤦♂️
Is this a completely unsupported use-case, or is there another way to use
datafiles
to map files discovered outside of the supported 'pattern' construct to instances of a datafiles Model? Thank you!The text was updated successfully, but these errors were encountered: