-
-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Cache results of local operations for re-use #628
Comments
Related files (photo and JSON) are sometimes saved in different segments of the takeout file. This obliges to process all parts of the takeout together. Bigger your takeout is, longer the preprocessing is.... Remember, you don't have to unzip the takeouts to process all parts. Caching the results add lot of complexity, and the result depends on the command line options, and from the current content of the immich server. Anyway, you can try various strategies:
|
I do agree that caching would be great, not only for google takeout, but importing folders as well. Would it also be possible to cache the google puzzle? Once you've gone over all takeout files, and matches which files is in which zip, and matches to which other file, and store the result into a file, a next run / continuation thereof could be sped up. For "normal" folders it would also be nice. It would add complexity for sure, so not pressuring you to implement this right now. You're already doing gods work here, and thank you for that! |
In my case I am periodically pulling a takeout of my entire Google photo collection, this amounts to 500GB split across several tgz archives. My workflow for generating the takeout, downloading it to my server, and extracting the contents is fully automated and doesn't require any manual preparation steps. The next step is using immich-go to upload the takeout to my immich server, which may be interrupted due to a crash or network issue etc... when that interruption happens I would have to start the upload again from scratch which would include all of the preprocessing work being repeated /before/ any upload operations are done. The ask here would be for the preprocessing step to take whatever data structure it has built before the upload step happens and simply serialize it to disk. Subsequently the same command could be run and instead of rerunning the preprocessing it would load the serialized struct back into memory and then move on to the upload stage of the run. It /could/ even contain the exact command line that was being used as part of the saved data, such that the way that you would restart using the cache could be an invocation like This is 100% a wishlist feature request, but (I haven't begun reading the code) seems like it shouldn't be overly complicated for my narrow use case. I have to imagine that others have a very similar use case where the cost/time of the preprocessing step is high enough that being able to skip it would be a huge win. |
I try to get this working reliably before trying to improve the performances. May be a simple improvement in the puzzle solving algorithm will suffice. The better optimization is running less code... |
This tool is really rad, except that my very large google takeout archive takes hours to complete the initial scanning and "puzzle" solving stages before getting to the upload phase. While I've seen issues that note that a re-run of the upload command will not re-upload files that have already been uploaded, I would like to be able to do my upload in chunks, but don't see the need to redoing all of the local client work on every run before uploading.
It'd be really great if the result of the local steps could be serialized and saved to some sort of cache that could be re-read on subsequent runs for the same takeout directory.
The text was updated successfully, but these errors were encountered: