Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] *partial match* on *full path* #169

Open
liar666 opened this issue Aug 30, 2024 · 14 comments
Open

[feature request] *partial match* on *full path* #169

liar666 opened this issue Aug 30, 2024 · 14 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@liar666
Copy link

liar666 commented Aug 30, 2024

Hi,

Thanks for the very nice tool. I used Nostalgy++ for a while, but I prefer your tool for the beautiful GUI and its ability to handle correctly custom key bindings.

However, there a thing that I miss a lot from Notalgy++, is the ability to match full path of directories and even with partial match.

e.g. I'm a teacher. There are several courses I teach every year. I teach courses with similar titles to various audiences. So I have a directory structure similar to: <year_number>/<course_audience>/ ; example : 2024-2025/Master/job.

When I select a target directory for batch moving emails, I would like to be able to type "25 ma jo" and get this examples directory selected over all the other ones.

@kewisch kewisch added enhancement New feature or request good first issue Good for newcomers labels Sep 11, 2024
@kewisch
Copy link
Owner

kewisch commented Sep 11, 2024

This seems reasonable. The search code is here, and would need to be modified to match on parts. Note this gets called a lot, so performance optimization is key. I'm marking this as a good first bug in case anyone wants to pick it up.

if (lowerSearchTerm) {
let searchWords = lowerSearchTerm.split(/\s+/);
for (let item of this.allItems) {
let itemText = this.getItemText(item).toLowerCase();
let mismatch = false;
for (let word of searchWords) {
if (word && !itemText.includes(word)) {
mismatch = true;
break;
}
}

micz added a commit to micz/quickmove-extension that referenced this issue Jan 6, 2025
@micz
Copy link
Contributor

micz commented Jan 6, 2025

Could it be something like this?

micz@d7fe72e

It checks if any word in the search strings matches any part of the folder path.
If this approach works for you, I can create a PR.

@micz
Copy link
Contributor

micz commented Jan 6, 2025

Maybe this

if (!hasAccent) {
    itemText = itemText.normalize("NFD").replace(DIACRITICS, "");
}

has to be applied also to the path before spitting in segments?

@micz
Copy link
Contributor

micz commented Jan 7, 2025

Perhaps it would be better to enable this new behavior as an option?

@kewisch
Copy link
Owner

kewisch commented Jan 7, 2025

Something like that might work. You might need to instead traverse the folder nodes to root instead of splitting by /, because it is unfortunately possible that folder names have a / as well (e.g. Owl).

What I am worried about is performance though. This makes the lookup O(n^2), and it is done for all folders. Ideally we'd build some sort of search tree or full text indexing, though I know that is a bit more work.

Do you see a way to optimize performance here?

@micz
Copy link
Contributor

micz commented Jan 7, 2025

Something like that might work. You might need to instead traverse the folder nodes to root instead of splitting by /, because it is unfortunately possible that folder names have a / as well (e.g. Owl).

Do you think that could be a problem? The folder matching should still occur in this case. Also, what does the '/' in the folder name represent?
How is the folder path displayed in the add-on in this case? Are you representing it differently?

What I am worried about is performance though. This makes the lookup O(n^2), and it is done for all folders. Ideally we'd build some sort of search tree or full text indexing, though I know that is a bit more work.

Given that we are handling folders and not messages, do you think the average user will notice any impact?
In my test on a very old machine (from 2011), I observed no difference between the original version and this one.
Perhaps placing this new algorithm behind a preference setting could be beneficial, not only due to the different behavior (this version displays folders when the string appears anywhere in the path, not just in the final folder name), but also to address potential performance concerns.

Do you see a way to optimize performance here?

Maybe in this line https://github.com/micz/quickmove-extension/blob/d7fe72e1b9c3f90b22d8abacb55d4991bee80b02/src/popup/baseItemList.js#L528 the use of some could be replaced with a loop that breaks at the first occurrence?

@kewisch
Copy link
Owner

kewisch commented Jan 7, 2025

Something like that might work. You might need to instead traverse the folder nodes to root instead of splitting by /, because it is unfortunately possible that folder names have a / as well (e.g. Owl).

Do you think that could be a problem? The folder matching should still occur in this case. Also, what does the '/' in the folder name represent? How is the folder path displayed in the add-on in this case? Are you representing it differently?

Yes, this has been an issue, there was an Owl issue here recently. Replacement is simple though see common/foldernode.js:

item.fullNameParts.some(segment => segment.includes(word));

It might also make sense to have an Iterator version of fullNameParts so that walking stops when a segment is found.

What I am worried about is performance though. This makes the lookup O(n^2), and it is done for all folders. Ideally we'd build some sort of search tree or full text indexing, though I know that is a bit more work.

Given that we are handling folders and not messages, do you think the average user will notice any impact? In my test on a very old machine (from 2011), I observed no difference between the original version and this one. Perhaps placing this new algorithm behind a preference setting could be beneficial, not only due to the different behavior (this version displays folders when the string appears anywhere in the path, not just in the final folder name), but also to address potential performance concerns.

I've seen a few reports saying that generating the folder list is very slow, so I do believe this is the case. Maybe when there are many many many folders on a slow machine. If we make it a default-off pref then I think we could risk it and wait until someone complains that it is slow when they turn it on.

I don't know off hand if it would be faster, but we could also maybe cache a string in the folder nodes lazily. Each node has a full text string that includes its own name and the name of the parents, so when we want to check if a folder should be in the list we just need to run one includes call on the cached string. We calculate this for each node the first time it occurs. This way we don't have an initial load penalty to cache the items. The first search would be just as slow, but as folks continue typing the subsequent searches would be faster. Would that make sense?

@micz
Copy link
Contributor

micz commented Jan 7, 2025

If we make it a default-off pref then I think we could risk it and wait until someone complains that it is slow when they turn it on.

I was thinking exactly about that.

I don't know off hand if it would be faster, but we could also maybe cache a string in the folder nodes lazily.

That's a great idea. Maybe there is no need to split the full path string either.
I'll do some tests.

@micz
Copy link
Contributor

micz commented Jan 8, 2025

I modified the code: micz@80d41af

There is no need to split the path. I tested it with 5000 folder and there is no delay.
I think it should be used a default-off pref because the behaviour changes as described here: #169 (comment)

Let me know if it's ok for you and how you want to procede.

@kewisch
Copy link
Owner

kewisch commented Jan 9, 2025

Sweet, thanks for testing, this is good news. I'm a bit reluctant to use the path though for matching. I believe there are some edge cases where the folder path does not match the display name. I was certain I had an example from a past issue but couldn't find it. In theory though, the display name can differ from the folder path. One potential case is when the folder name is localized, e.g. the path is INBOX/Drafts but in Thunderbird it is displayed in German as Entwürfe. The API docs also say "Although paths look predictable, never guess a folder’s path, as there are a number of reasons why it may not be what you think it is" which could be related.

I'm not 100% certain this is true, it might be worth asking in the add-on developer's channel if it is safe to use the path to match folder display names or if my suspicion is true. The safe path is to actually use the name and iterate upwards.

@micz
Copy link
Contributor

micz commented Jan 9, 2025

I understand your point, you're right.
Also, this is your add-on, so you're the boss. 👍 😄

I'll try to use the lazy loading you suggested to create a string iterating upwards using the display names.
This should be fastest than creating an array to traverse with every word. Then I'll test it with the 5000 folders test environment.
Is it ok for you this approach?

@kewisch
Copy link
Owner

kewisch commented Jan 9, 2025

Yep that works for me, thank you so much for persisting on this!

@micz
Copy link
Contributor

micz commented Jan 10, 2025

I added a new property to FolderNode, #_fullSearchString, and then implemented a getter, with lazy laoding of the property.
Finally, I use it in the repopulate method.

In the test with 5000 folders, it is as fast as before.

I hope I got it right this time!

@micz
Copy link
Contributor

micz commented Jan 11, 2025

If that #169 (comment) is right, the next steps could be:

  1. Sync my fork with you last commits
  2. Add the corresponding option
  3. Modify the repopulate method to use the option
  4. Submit a PR

In step 3, do you prefer a single loop with an if statement inside it, or two separate loops with the if condition outside? The former avoids code duplication, but the latter might be faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants