Feature Request - Cache results of local operations for re-use #628

drawks · 2025-01-14T02:22:37Z

This tool is really rad, except that my very large google takeout archive takes hours to complete the initial scanning and "puzzle" solving stages before getting to the upload phase. While I've seen issues that note that a re-run of the upload command will not re-upload files that have already been uploaded, I would like to be able to do my upload in chunks, but don't see the need to redoing all of the local client work on every run before uploading.

It'd be really great if the result of the local steps could be serialized and saved to some sort of cache that could be re-read on subsequent runs for the same takeout directory.

simulot · 2025-01-14T13:00:59Z

Related files (photo and JSON) are sometimes saved in different segments of the takeout file. This obliges to process all parts of the takeout together. Bigger your takeout is, longer the preprocessing is.... Remember, you don't have to unzip the takeouts to process all parts.

Caching the results add lot of complexity, and the result depends on the command line options, and from the current content of the immich server.

Anyway, you can try various strategies:

Ask for smaller takeouts. By years, of a bunch of albums at a time
Run immich-go on a beefier machine, with the zip files located on the same machine for a better speed
Use the latest immich-go version to "archive" the takeout into a folder, and process the folder

Dunky13 · 2025-01-14T19:30:03Z

I do agree that caching would be great, not only for google takeout, but importing folders as well.
You could cache (even takeout) based on cli input params, if they haven't changes, and the files in question haven't changes (sha sum matching for example)? continue from cache.

Would it also be possible to cache the google puzzle? Once you've gone over all takeout files, and matches which files is in which zip, and matches to which other file, and store the result into a file, a next run / continuation thereof could be sped up.
(again, if and only if, the cli command & files are the same as a previous run, any change and process starts fresh with a new cache)

For "normal" folders it would also be nice.
My usecase, I'm trying to import some pictures from a Backblaze B2 backup. I've mounted it using rclone, and for all intents and purposes the mount behaves as a folder. But by no means any fast, since it's a remote B2 backup. So having a cache of what has already been done would be great. I mean, right now it skips if server already contains the file. But if I could just skip to where I left off (in my case skip the first 10k pictures)

It would add complexity for sure, so not pressuring you to implement this right now. You're already doing gods work here, and thank you for that!

drawks · 2025-01-14T19:49:58Z

Related files (photo and JSON) are sometimes saved in different segments of the takeout file. This obliges to process all parts of the takeout together. Bigger your takeout is, longer the preprocessing is.... Remember, you don't have to unzip the takeouts to process all parts.

In my case I am periodically pulling a takeout of my entire Google photo collection, this amounts to 500GB split across several tgz archives. My workflow for generating the takeout, downloading it to my server, and extracting the contents is fully automated and doesn't require any manual preparation steps.

The next step is using immich-go to upload the takeout to my immich server, which may be interrupted due to a crash or network issue etc... when that interruption happens I would have to start the upload again from scratch which would include all of the preprocessing work being repeated /before/ any upload operations are done.

The ask here would be for the preprocessing step to take whatever data structure it has built before the upload step happens and simply serialize it to disk. Subsequently the same command could be run and instead of rerunning the preprocessing it would load the serialized struct back into memory and then move on to the upload stage of the run.

It /could/ even contain the exact command line that was being used as part of the saved data, such that the way that you would restart using the cache could be an invocation like immich-go --resume state.blob which would then echo back what the command line is and prompt you y/n if you'd like to rerun with the preprocess skipped in favor of the cached state.

This is 100% a wishlist feature request, but (I haven't begun reading the code) seems like it shouldn't be overly complicated for my narrow use case. I have to imagine that others have a very similar use case where the cost/time of the preprocessing step is high enough that being able to skip it would be a huge win.

simulot · 2025-01-16T07:58:01Z

I try to get this working reliably before trying to improve the performances.

May be a simple improvement in the puzzle solving algorithm will suffice. The better optimization is running less code...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request - Cache results of local operations for re-use #628

Feature Request - Cache results of local operations for re-use #628

drawks commented Jan 14, 2025

simulot commented Jan 14, 2025

Dunky13 commented Jan 14, 2025

drawks commented Jan 14, 2025

simulot commented Jan 16, 2025

Feature Request - Cache results of local operations for re-use #628

Feature Request - Cache results of local operations for re-use #628

Comments

drawks commented Jan 14, 2025

simulot commented Jan 14, 2025

Dunky13 commented Jan 14, 2025

drawks commented Jan 14, 2025

simulot commented Jan 16, 2025