You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I benchmarked document embeddings of ollama==0.4.2 vs. llama_cpp_python==0.2.69.
I used pretrained LLM models to create document embeddings and scikit-learn LogisticRegression to classify the documents.
The Llama results are in the same general ballpark, but especially Qwen2.5 1.5b is performing much worse in ollama python than in llama-cpp-python.
The classification code is exactly the same between libraries, and I assume the models pulled are similar too. I don't know what causes the difference, whether it is a difference in pooling, or quantification, or random sampling error.
On a separate note, Llama-cpp-python is also 4x faster than ollama python.
I benchmarked document embeddings of ollama==0.4.2 vs. llama_cpp_python==0.2.69.
I used pretrained LLM models to create document embeddings and scikit-learn LogisticRegression to classify the documents.
The Llama results are in the same general ballpark, but especially Qwen2.5 1.5b is performing much worse in ollama python than in llama-cpp-python.
The classification code is exactly the same between libraries, and I assume the models pulled are similar too. I don't know what causes the difference, whether it is a difference in pooling, or quantification, or random sampling error.
On a separate note, Llama-cpp-python is also 4x faster than ollama python.
These are my results:
ollama==0.4.2,
ollama.embed(model=model_name, input=[...])
llama_cpp_ptyon==0.2.69
To check for a potential model mismatch I also pulled the same models used as in llama-cpp and ran them in ollama python:
In the above iteration Qwen2-1.5B is doing much better, but Qwen2.5-3B is still performing much worse.
Full source code below:
The text was updated successfully, but these errors were encountered: