Improve default separators / Being able to modify them #160
Replies: 11 comments 9 replies
-
The same applies for I guess this relates to https://docs.rs/unicode-segmentation/1.8.0/unicode_segmentation/trait.UnicodeSegmentation.html#tymethod.split_word_bounds which the tokeniser uses. I'm do understand |
Beta Was this translation helpful? Give feedback.
-
(I'm happy for this to be closed and re-raised as a feature suggestion rather than a bug if you prefer) |
Beta Was this translation helpful? Give feedback.
-
Hello @aidanhs! This is indeed an expected behavior with the current search engine. This is not a bug, but indeed a feature request 🙂 Thanks a lot for your feedback @aidanhs!! |
Beta Was this translation helpful? Give feedback.
-
There are two things that seem important here.
|
Beta Was this translation helpful? Give feedback.
-
Does anyone know how I can customize the separators? I have meilisearch 0.27.1 installed on Centos from source. Most of my pages are based on guitar tunings e.g. Another problem I have is that I have millions of pages and bots like Google and Bing are on my site constantly. Many of the pages trigger Meili just by visiting a page where videos relevant to guitar tunings are showed. So when I get 60000 hits from google in one day my site grinds to a halt until the bot realises they cannot get a decent response and then slow down their indexing. See my current Google crawl stats for the last Some pages have a tunings like this Another problem I have is that I have millions of pages and bots like Google and Bing are on my site constantly. Many of the pages trigger Meili just by visiting a page where videos relevant to guitar tunings are showed. So when I get 60000 hits from google in one day my site grinds to a halt until the bot realises they cannot get a decent response and then slow down their indexing. See my current Google crawl stats for the last The orange line is response time from my server to Google. Where you see the response jump, around 25th September, is when I went live with Meilisearch. |
Beta Was this translation helpful? Give feedback.
-
Hello @gmourier @ManyTheFish Thanks so much for the suggestions and your time on this issue I have. I will try No.1 first. Though I do not understand the implications mentioned about it by @gmourier. I will have to test it out and see the difference in results and performance
No.2 would also match tunings where there is no
No.3 sounds more doable from my end. Though I am unsure how to edit the results as they get indexed to Meilisearch. Perhaps this can be done with meiliesearch-php?
I use tunings in URLs but I replace the |
Beta Was this translation helpful? Give feedback.
-
Hello @gmourier @ManyTheFish I have managed to reduce my load time from I managed to use the search and replace suggestion with Laravel Scout. You can see my solution on this comment meilisearch/meilisearch-php#411 (comment) Thanks for all of your suggestions! |
Beta Was this translation helpful? Give feedback.
-
Hello everyone 👋 We just released a 🧪 prototype that allows customizing tokenization and we'd love your feedback. How to get the prototype?Using docker, use the following command:
From source, compile Meilisearch on the How to use the prototype?You can find all the details in the PR. Feedback and bug reporting when using this prototype are encouraged! Thanks in advance for your involvement. It means a lot to us ❤️ |
Beta Was this translation helpful? Give feedback.
-
Hello everyone 👋 We have just released the first RC (release candidate) of Meilisearch containing this new feature! You can test it by using:
You are welcome to leave your feedback in this discussion. If you encounter any bugs, please report them here. 🎉 Official and stable release containing this change will be available on September 25th, 2023 |
Beta Was this translation helpful? Give feedback.
-
Hey folks 👋 v1.4.0 has been released! 🦓 You can now customize tokenization by adding or removing tokens from the separator tokens and non-separator tokens lists. ✨ Note: |
Beta Was this translation helpful? Give feedback.
-
Hello, I have finally migrated my server so I can move away from Upon removing my custom search queries which stopped the search using My question is should I be using a custom Separator Token instead of the default Edit: // set sharp/#/flat/♭ as non separator token
$client->index('guitar_tunings')->updateNonSeparatorTokens(['#','♭']); There is an improvement in performance. Instead of "D#D#C#C#G#G#" returning 1000 results it returns only the exact matches. It is so perfect... Thank you! |
Beta Was this translation helpful? Give feedback.
-
Description (@gmourier)
&
character for example.node.js
<- the.
is part of a word.Initial comment (@aidanhs)
Describe the bug
Searching for 'hi' in a document set with 'hi' and '&hi' only returns the one result.
To Reproduce
Expected behavior
I expect both results to be returned
MeiliSearch version: v0.20.0
Beta Was this translation helpful? Give feedback.
All reactions