disable kv cache compression for fp vlm #1080

eaidova · 2024-12-19T06:36:56Z

What does this PR do?

Fixes issue with failed minicpmv with ov nightly. starting from 2024.6 openvino will use kv cache compression by default enabled, that may impact model accuracy, but identify when it should be disabled can not be predicted on runtime level, so we proposed addition of specific hint for such models (by our agreement it should be done for noncompressed models only) - extended this approach to handle language models as part visual language models

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2024-12-19T06:44:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

eaidova · 2024-12-19T12:45:29Z

@echarlaix @IlyasMoutawwakil could you please take a look? ov 2024/6 release happened couple of hours ago, so this minicpmv test failure should be visible on main branch

echarlaix

Thanks for the rapid fix @eaidova

* Support AWQ models * Add tests * Add dependencies * Fix tests * enable awq export only if ov support it * fix style (#2) * disable awq and gptq install for old torch (#3) * fix style * disable autogptq and autoawq install for old transformers testing * separate common quant models patching and gptq (#4) * disable windows install (#5) * separate common quant models patching and gptq * disable awq windows * skip logits check for quantized models (#6) * fix test after rebase * fix testing condition for 2024.6 and unpatch in case if failed * Fix qwen2-vl tests (#1084) * Skip private mdoel loading test for external contributors (#1082) * Fix reshaping unet if timestep is 0d tensor (#1083) * Disable kv cache compression for fp vlm (#1080) * Support AWQ models * Add tests * Add dependencies * Fix tests * enable awq export only if ov support it * fix style (#2) * disable awq and gptq install for old torch (#3) * fix style * disable autogptq and autoawq install for old transformers testing * separate common quant models patching and gptq (#4) * disable windows install (#5) * separate common quant models patching and gptq * disable awq windows * skip logits check for quantized models (#6) * fix test after rebase * fix testing condition for 2024.6 and unpatch in case if failed * add necessary packages in test_openvino_full * fix code style after rebase (#7) --------- Co-authored-by: eaidova <[email protected]> Co-authored-by: Nikita Savelyev <[email protected]> Co-authored-by: Ella Charlaix <[email protected]>

eaidova added the openvino-test Trigger OpenVINO slow tests label Dec 19, 2024

disable kv cache compression for fp vlm

33cef0f

eaidova force-pushed the ea/enable_rt_info_for_vlm branch from ede24e1 to 33cef0f Compare December 19, 2024 06:39

eaidova requested review from AlexKoff88 and nikita-savelyevv December 19, 2024 06:52

AlexKoff88 approved these changes Dec 19, 2024

View reviewed changes

nikita-savelyevv approved these changes Dec 19, 2024

View reviewed changes

eaidova requested review from IlyasMoutawwakil, echarlaix and glegendre01 December 19, 2024 12:44

eaidova removed the request for review from glegendre01 December 19, 2024 12:46

nikita-savelyevv mentioned this pull request Dec 19, 2024

[OV] Fix qwen2-vl tests #1084

Merged

3 tasks

echarlaix approved these changes Dec 19, 2024

View reviewed changes

echarlaix merged commit 8ef3997 into huggingface:main Dec 19, 2024
20 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable kv cache compression for fp vlm #1080

disable kv cache compression for fp vlm #1080

eaidova commented Dec 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 19, 2024

eaidova commented Dec 19, 2024 •

edited

Loading

echarlaix left a comment

disable kv cache compression for fp vlm #1080

disable kv cache compression for fp vlm #1080

Conversation

eaidova commented Dec 19, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Dec 19, 2024

eaidova commented Dec 19, 2024 • edited Loading

echarlaix left a comment

Choose a reason for hiding this comment

eaidova commented Dec 19, 2024 •

edited

Loading

eaidova commented Dec 19, 2024 •

edited

Loading