-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I Can`t search for Chinese #11
Comments
Do you have an repo with an example? |
I can't search Chinese, for example "的". |
It works if you place a space after.. , not great... I'm gonna take a look. |
@yyrc can you share your repo with me? |
Will this be fixed?I notice that when I search English words in Chinese articles, the search result is not correct and Chinese words are not searched. |
I tried to clone your repo but then couldn't find it.. with more data the better.. |
You can clone my repo: https://github.com/Charles7c/charles7c.github.io.git However, you need to enable |
Must add language support and make It avaiable in the plugin options https://github.com/MihaiValentin/lunr-languages |
var lunr = require('./lib/lunr.js');
require('./lunr.stemmer.support.js')(lunr);
require('./lunr.ru.js')(lunr);
require('./lunr.multi.js')(lunr);
var idx = lunr(function () {
// the reason "en" does not appear above is that "en" is built in into lunr js
this.use(lunr.multiLanguage('en', 'ru'));
// then, the normal lunr index initialization
// ...
}); How does the configuration take effect in the plug-in? @emersonbottero import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'
export default defineConfig({
plugins: [
SearchPlugin({
//Add a wildcard at the end of the search
wildcard: false,
//The length of the result search preview item
previewLength: 62,
})
]
}) |
I have to change my plugin based in my last comment |
Thanks a lot. 👍 |
Expecting |
just for reference vitejs/vite#10486 |
Just an updated.. Due to the lack of maintenance in the lunr project I decide to switch the index library to flexsearch Once this is fixed it should be possible to pass all index options in the library to the plugin. but we can and should improve that with an actual chinese language!
|
Is there an easy solution for now? to be honest I'm not familiar with any of the libraries mentioned above... |
just notice we can download the flexsearch files. |
I did it.. 😁 |
Could Someone tell me if It works? |
I upgraded the version to 1.0.4-alpha.15, and then looked at the link below, but didn't quite understand how to configure it, and finally it didn't work. |
i tried same config, it only works when search one word ,like this:
|
It should be { And you Will only find whole words.. |
To search for partials there should be another setting options |
my config: import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";
export default defineConfig({
plugins: [SearchPlugin({
encode: str => str.replace(/[\x00-\x7F]/g, "").split("")
})],
}); |
Try both settings toguether. import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";
export default defineConfig({
plugins: [SearchPlugin({
encode: str => str.replace(/[\x00-\x7F]/g, "").split(""),
tokenize: "full"
})],
}); |
I'll take a look.. |
plz, try SearchPlugin({
encode: false,
tokenize: function (str) {
return str.replace(/[\x00-\x7F]/g, "").split("");
},
filter:
"的 一 不 在 人 有 是 为 以 于 上 他 而 后 之 来 及 了 因 下 可 到 由 这 与 也 此 但 并 个 其 已 无 小 我 们 起 最 再 今 去 好 只 又 或 很 亦 某 把 那 你 乃 它 吧 被 比 别 趁 当 从 到 得 打 凡 儿 尔 该 各 给 跟 和 何 还 即 几 既 看 据 距 靠 啦 了 另 么 每 们 嘛 拿 哪 那 您 凭 且 却 让 仍 啥 如 若 使 谁 虽 随 同 所 她 哇 嗡 往 哪 些 向 沿 哟 用 于 咱 则 怎 曾 至 致 着 诸 自".split(
" "
),
}), if the filter does not make sense you can remove |
你把 |
总不能写了啥我还得手动加一下吧哈哈哈 |
all that this does |
tokenize: "full" should return a lot of results. Is really hard for me to debug because I don't know chinese. |
nb
…---Original---
From: ***@***.***>
Date: Sun, Nov 6, 2022 10:18 AM
To: ***@***.***>;
Cc: ***@***.******@***.***>;
Subject: Re: [emersonbottero/vitepress-plugin-search] I Can`t search forChinese (Issue #11)
wow! it works!!!!
export default defineConfig({ plugins: [SearchPlugin({ encode: false, tokenize: "full" })], });
enter 单、单元、单元测试 will list 单元测试 (means unit test 😁)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.Message ID: ***@***.***>
|
thanks a lot. @emersonbottero , as configured for the @li-zheng-hao test, it worked. 😁 // vite.config.ts
import { defineConfig } from 'vite'
import { SearchPlugin } from 'vitepress-plugin-search'
export default defineConfig({
plugins: [
SearchPlugin({
encode: false,
tokenize: 'full'
})
]
}) |
Uhuuuu 🎉 |
It works for me now. Thank you @emersonbottero @Charles7c @li-zheng-hao 奇怪,不知道为什么一开始没有搜到这个贴子。感谢。 |
tokenize: 'full' |
You can try forward. |
Thanks for your reply. If I change tokenize to "forward". That will reduce the count of results. I can only find the results that the search word located on the start of the whole sentence. I think it is because CJK language words are not divided by space but by semantics. |
Finally, I think I got the solution. I found a word splitter for Chinese text: https://github.com/leizongmin/node-segment I installed it: however, I have to split the key words by space manually in searchbox that in the nav bar. Else I will get nothing if the two words in searchbox is not separated by space. (Can it be auto?) Now the size of index file is reduced to 1,662Kb 83M+ -> 1.6M. Really great progress. If I change the tokenizer to "full". It will be about 2,581Kb. // docs/vite.config.ts
import { SearchPlugin } from "vitepress-plugin-search";
import { defineConfig } from "vite";
// 分词器来源
// https://wenjiangs.com/article/segment.html
// https://github.com/leizongmin/node-segment
// 安装:
// yarn add segment -D
// 以下为样例
// 载入模块
var Segment = require('segment');
// 创建实例
var segment = new Segment();
// 使用默认的识别模块及字典,载入字典文件需要1秒,仅初始化时执行一次即可
segment.useDefault();
// 开始分词
// console.log(segment.doSegment('这是一个基于Node.js的中文分词模块。'));
var options = {
// 采用分词器优化,
encode: function (str) {
return segment.doSegment(str, {simple: true});
},
tokenize: "forward", // 解决汉字搜索问题。来源:https://github.com/emersonbottero/vitepress-plugin-search/issues/11
// 以下代码返回完美的结果,但内存与空间消耗巨大,索引文件达到80M+
// encode: false,
// tokenize: "full",
};
export default defineConfig({
plugins: [SearchPlugin(options)],
}); |
当vitepress里存在base设置时,就是这个 部署到github page,搜索后回车,base会丢失,导致404,本地跑没有这种情况 |
请问,只能搜索文章中的包含的标题名称,而不能搜索文章名称吗? |
I hope to be able to search for Chinese
The text was updated successfully, but these errors were encountered: