Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whisper local LLM #10

Closed
mecattaf opened this issue Oct 7, 2024 · 7 comments
Closed

whisper local LLM #10

mecattaf opened this issue Oct 7, 2024 · 7 comments

Comments

@mecattaf
Copy link
Owner

mecattaf commented Oct 7, 2024

using this plugin:
Robitx/gp.nvim#122

we make sure to always have whisper server running locally, probably whisper.cpp

we note that the turbo models run very fast
https://github.com/openai/whisper/pull/2361/files

@mecattaf
Copy link
Owner Author

mecattaf commented Dec 1, 2024

We decided not to move forward with attempts at using NPU as it is too early. However, we can leverage OpenVINO CPU acceleration for whisper.

Find example whisper pipeline in python:
https://github.com/openvinotoolkit/openvino.genai?tab=readme-ov-file#speech-to-text-processing-using-whisper-pipeline
jupyter notebook:
https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/whisper-asr-genai/whisper-asr-genai.ipynb

and full python recorder file, speech recognition python and README file where they mention converting a model for openVino:
https://github.com/openvinotoolkit/openvino.genai/tree/master/samples/python/whisper_speech_recognition

note that the final goal is to have a whisper cpp server running locally, which can then be accessed by gp.nvim:
Robitx/gp.nvim#224

whisper.cpp is available on fedora but appears not to have openVINO support built in. We have to package it ourselves.
we find latest releases here: https://github.com/ggerganov/whisper.cpp/releases
the spec file to draw inspiration from is herre:
https://src.fedoraproject.org/rpms/whisper-cpp/blob/rawhide/f/whisper-cpp.spec
however it does not have openVINO turned on by default out of the box?
https://github.com/ggerganov/whisper.cpp?tab=readme-ov-file#openvino-support

if this is the case we will also need openvino runtime, which can be installed from a yum repo:
https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-yum.html

finally, we will need to convert the model to be openVINO-compatible:
https://medium.com/openvino-toolkit/how-to-run-whisper-automatic-speech-recognition-system-locally-on-cpu-or-gpu-with-openvino-a6dc0c000ada

@mecattaf
Copy link
Owner Author

mecattaf commented Dec 1, 2024

clarification:
The original post conflated two separate approaches to using OpenVINO with Whisper. The Python implementation using OpenVINO GenAI Library (with its model conversion via Optimum-CLI) is a completely separate solution from whisper.cpp's built-in OpenVINO support. While both achieve hardware acceleration through OpenVINO, they use different toolchains and starting points - the Python approach converts from PyTorch models directly to OpenVINO IR format, while whisper.cpp converts from GGML to OpenVINO IR format using its own conversion tools.
For the goal of running an accelerated whisper.cpp server locally, we should focus solely on whisper.cpp's native OpenVINO support. This means building whisper.cpp with OpenVINO support enabled, installing the OpenVINO runtime from Intel's repository, and using whisper.cpp's own model conversion script (found in the models/ directory) to convert GGML models to OpenVINO IR format. The Python implementation and its conversion process described in the Medium article are not relevant to this specific use case. We do not have to convert models, it is sufficient to build whisper cpp COPR woth openvino support, and then feed the gguf models to it.
we first experiment with the performance without openvino backend to see if it is acceptable.

@mecattaf
Copy link
Owner Author

mecattaf commented Dec 1, 2024

models/convert-whisper-to-openvino.py:

Looking at the code and whisper.cpp's implementation, this conversion is NOT automatic. It needs to be done manually before you can use OpenVINO acceleration. Here's the workflow:

First build and install whisper.cpp with OpenVINO support
Then, before running the server with OpenVINO acceleration, you need to:

Set up a Python environment with the required dependencies (whisper, torch, openvino)
Run this conversion script for your model (e.g., python convert-whisper-to-openvino.py --model base.en)
The script will create ggml-base.en-encoder-openvino.xml and .bin files

Finally, when running the server, it will look for these OpenVINO IR model files alongside your GGML/GGUF models

This is a one-time setup per model - once you've converted a model, you can reuse the OpenVINO version. The script converts the encoder part of the model to OpenVINO's format, which is what enables the hardware acceleration.
So your previous plan needs to include this conversion step if you want to use OpenVINO acceleration. However, your approach to test without OpenVINO first is still valid - you can decide whether the conversion effort is worth it based on your baseline performance tests. CopyRetryTand then how can i run the server with acceleration?EditLooking at the whisper.cpp server implementation, once you have:

Built whisper.cpp with OpenVINO support (-DWHISPER_OPENVINO=ON)
Converted your model using the Python script above
Have both the original GGML/GGUF model and the converted OpenVINO files in your models directory

You can run the server with OpenVINO acceleration by adding these flags:
bashCopy./server -m models/ggml-base.en.bin --openvino
The server will look for the corresponding OpenVINO files (ggml-base.en-encoder-openvino.xml and .bin) in the same directory as your GGML model.
You can verify OpenVINO is being used by checking the server startup logs - it should mention something about loading the OpenVINO encoder model.
So the main difference from your current server command is just adding the --openvino flag to enable the OpenVINO backend for acceleration.

@mecattaf
Copy link
Owner Author

mecattaf commented Dec 9, 2024

@mecattaf
Copy link
Owner Author

mecattaf commented Dec 9, 2024

@mecattaf
Copy link
Owner Author

deprecated in favor of #20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant