Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Git repos aren't able to be indexed #3715

Open
realhackcraft opened this issue Jan 16, 2025 · 10 comments
Open

Large Git repos aren't able to be indexed #3715

realhackcraft opened this issue Jan 16, 2025 · 10 comments

Comments

@realhackcraft
Copy link

Describe the bug

I'm not sure if this is an issue about my device, because it isn't very powerful.
When using tabby to index repos, it always shows some amount of logs like these:

2025-01-16T14:10:36.837159Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.878975Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body
2025-01-16T14:10:36.889699Z  WARN tabby_index::indexer: crates/tabby-index/src/indexer.rs:90: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/example-showcase/src/main.rs","language":"rust","git_hash":"49afb15c2048c316cdc37d27390618f7a9f90055"}': Failed to embed chunk text: error decoding response body

This is especially apparent on https://github.com/bevyengine/bevy, since I left it overnight and it still didn't finish indexing. As stated, I'm not sure if it's just my computer not being able to index very fast, but it even overnight it didn't finish.

Information about your version
tabby 0.22.0

Information about your GPU
Apple M2 (4+4) @ 3.50 GHz

Additional context
Any repo will output those warning logs, but they still finish. For the bevy repo, it gets stuck on the example showcase main file

@realhackcraft
Copy link
Author

I'm using this to run: tabby serve --model Qwen2.5-Coder-3B --parallelism 16 --device metal

@wsxiaoys
Copy link
Member

We have set a timeout for each embedding model request to keep the indexing time manageable. As a result, some requests may fail and trigger warnings.

Please note that these warnings can be safely ignored. Tabby's background indexing job will incrementally re-index any failed chunks

@realhackcraft
Copy link
Author

some requests may fail and trigger warnings
That's what I thought, but even after 8 hours, it's still not finished. Do you reckon that is a thing to do with a M2 base GPU or not?

@wsxiaoys
Copy link
Member

if you have a really large repo, that's somewhat expected. The only way to speed it up is to use more powerful model serving backend (and more powerful hardware ) to reduce the time of a cold start.

Good thing is, once finished for the first run, the future incremental indexing shall be much faster.

@realhackcraft
Copy link
Author

realhackcraft commented Jan 19, 2025

I've been running the indexing on bevy again for 24 hours this time, and it's still not done. I have noticed that while it is running, it's not using a lot of gpu nor cpu.

Indexing my repo, ~1k lines:

Image

Indexing bevy:

Image

The bevy's indexing spike is a bit shorter than that of my repo's but bevy's repo has definitely more lines than mine. It also doesn't finish the indexing after the spike drops.

I also noticed that the warning logs stopped at the same time as the gpu stops, which led me to believe that the indexing stopped partway through, but didn't report as such.

@wsxiaoys
Copy link
Member

Thanks for offer the help to debug! I conducted a quick test by indexing 'bery' in Tabby's demo instance, and it functioned as anticipated; therefore, I was unable to replicate the issue. Here is the job log for reference: https://demo.tabbyml.com/jobs/detail?id=Gp3WX1

Could you try turn on debug log (e.g RUST_LOG=debug) to see if there's anything suspicious?

@realhackcraft
Copy link
Author

In the log, I found something like this:

2025-01-20T20:19:35.631776Z WARN tabby_index::indexer:crates/tabby-index/src/indexer.rs: Failed to build chunk for document 'git:V1xYGx:::{"path":"/Users/me/.tabby/repositories/https_github.com_bevyengine_bevy/tools/ci/src/ci.rs","language":"rust","git_hash":"043a887e37ec704dbe97981a7bdfb6ad534d6d5b"}': Failed to embed chunk text: error sending request for url (http://127.0.0.1:30888/embedding)

Also in the system tab, I see

Image

I think it has something to do with the embedding, although I'm not sure how to set it up. Maybe that's the problem?

@wsxiaoys
Copy link
Member

#3715 (comment).

The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.

Can you share the command you used to start tabby?

@realhackcraft
Copy link
Author

I'm using tabby serve --model Qwen2.5-Coder-3B --parallelism 32 --device metal to start.

The error can be safely disregarded if it occurs infrequently, especially when considered in the context of the total number of chunks processed.

I'm not sure if it occurs frequently or not, but there are 42 warnings like that for a repo of ~500 lines.

@realhackcraft
Copy link
Author

I did the same with bevy, and up until the point of failure, there are 255 Failed to build chunk for document warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants