Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds DonutSwin to models exportable with ONNX #19401

Closed
wants to merge 1 commit into from

Conversation

WaterKnight1998
Copy link

@WaterKnight1998 WaterKnight1998 commented Oct 7, 2022

What does this PR do?

Fixes #16308

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@lewtun & @chainyo for ONNX and @NielsRogge for Donut and Document Question Answering.

@WaterKnight1998 WaterKnight1998 changed the title Adds Donut to models exportable with ONNX Adds DonutSwin to models exportable with ONNX Oct 7, 2022
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Contributor

@chainyo chainyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WaterKnight1998,

Thanks for your PR. It looks clean.

Nice catch for the model-type variable that could be tricky to find: https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/blob/main/config.json#L138

First DocumentQuestionAnswering model added. It's pretty cool!

@WaterKnight1998
Copy link
Author

Hi @WaterKnight1998,

Thanks for your PR. It looks clean.

Nice catch for the model-type variable that could be tricky to find: https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/blob/main/config.json#L138

First DocumentQuestionAnswering model added. It's pretty cool!

I don't see the comment. Do I need to solve anything?

However, for testing locally I was using next code but I can't export the model :(

I exported just encoder like this

from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

Then trying to convert to onnx I get:

python -m transformers.onnx --model=./swin onnx/
Local PyTorch model found.
Framework not requested. Using torch to export to ONNX.
/home/david/.local/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Using framework PyTorch: 1.12.1+cu116
Traceback (most recent call last):
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/__main__.py", line 115, in <module>
    main()
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/__main__.py", line 97, in main
    onnx_inputs, onnx_outputs = export(
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/convert.py", line 337, in export
    return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/convert.py", line 144, in export_pytorch
    model_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/config.py", line 348, in generate_dummy_inputs
    raise ValueError(
ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor.

Do I need to add more code?

@chainyo
Copy link
Contributor

chainyo commented Oct 7, 2022

Do I need to add more code?

Yes, it would help if you overcharged the generate_dummy_inputs() function. Like the LayoutLMv3 model, you need to define the process as a dummy input. ONNX conversion models use one batch (even random dummy data) to follow the data flow through the graph layers.

Check this here:

def generate_dummy_inputs(
self,
processor: "ProcessorMixin",
batch_size: int = -1,
seq_length: int = -1,
is_pair: bool = False,
framework: Optional["TensorType"] = None,
num_channels: int = 3,
image_width: int = 40,
image_height: int = 40,
) -> Mapping[str, Any]:
"""
Generate inputs to provide to the ONNX exporter for the specific framework
Args:
processor ([`ProcessorMixin`]):
The processor associated with this model configuration.
batch_size (`int`, *optional*, defaults to -1):
The batch size to export the model for (-1 means dynamic axis).
seq_length (`int`, *optional*, defaults to -1):
The sequence length to export the model for (-1 means dynamic axis).
is_pair (`bool`, *optional*, defaults to `False`):
Indicate if the input is a pair (sentence 1, sentence 2).
framework (`TensorType`, *optional*, defaults to `None`):
The framework (PyTorch or TensorFlow) that the processor will generate tensors for.
num_channels (`int`, *optional*, defaults to 3):
The number of channels of the generated images.
image_width (`int`, *optional*, defaults to 40):
The width of the generated images.
image_height (`int`, *optional*, defaults to 40):
The height of the generated images.
Returns:
Mapping[str, Any]: holding the kwargs to provide to the model's forward function
"""
# A dummy image is used so OCR should not be applied
setattr(processor.feature_extractor, "apply_ocr", False)
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = processor.tokenizer.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_text = [[" ".join([processor.tokenizer.unk_token]) * seq_length]] * batch_size
# Generate dummy bounding boxes
dummy_bboxes = [[[48, 84, 73, 128]]] * batch_size
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
# batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
dummy_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
inputs = dict(
processor(
dummy_image,
text=dummy_text,
boxes=dummy_bboxes,
return_tensors=framework,
)
)
return inputs

This can help too, it's the base generate_dummy_inputs() function :

def generate_dummy_inputs(
self,
preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"],
batch_size: int = -1,
seq_length: int = -1,
num_choices: int = -1,
is_pair: bool = False,
framework: Optional[TensorType] = None,
num_channels: int = 3,
image_width: int = 40,
image_height: int = 40,
tokenizer: "PreTrainedTokenizerBase" = None,
) -> Mapping[str, Any]:
"""
Generate inputs to provide to the ONNX exporter for the specific framework
Args:
preprocessor: ([`PreTrainedTokenizerBase`] or [`FeatureExtractionMixin`]):
The preprocessor associated with this model configuration.
batch_size (`int`, *optional*, defaults to -1):
The batch size to export the model for (-1 means dynamic axis).
num_choices (`int`, *optional*, defaults to -1):
The number of candidate answers provided for multiple choice task (-1 means dynamic axis).
seq_length (`int`, *optional*, defaults to -1):
The sequence length to export the model for (-1 means dynamic axis).
is_pair (`bool`, *optional*, defaults to `False`):
Indicate if the input is a pair (sentence 1, sentence 2)
framework (`TensorType`, *optional*, defaults to `None`):
The framework (PyTorch or TensorFlow) that the tokenizer will generate tensors for.
num_channels (`int`, *optional*, defaults to 3):
The number of channels of the generated images.
image_width (`int`, *optional*, defaults to 40):
The width of the generated images.
image_height (`int`, *optional*, defaults to 40):
The height of the generated images.
Returns:
Mapping[str, Tensor] holding the kwargs to provide to the model's forward function
"""
from ..feature_extraction_utils import FeatureExtractionMixin
from ..tokenization_utils_base import PreTrainedTokenizerBase
if isinstance(preprocessor, PreTrainedTokenizerBase) and tokenizer is not None:
raise ValueError("You cannot provide both a tokenizer and a preprocessor to generate dummy inputs.")
if tokenizer is not None:
warnings.warn(
"The `tokenizer` argument is deprecated and will be removed in version 5 of Transformers. Use"
" `preprocessor` instead.",
FutureWarning,
)
logger.warning("Overwriting the `preprocessor` argument with `tokenizer` to generate dummmy inputs.")
preprocessor = tokenizer
if isinstance(preprocessor, PreTrainedTokenizerBase):
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(
batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0
)
# If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX
token_to_add = preprocessor.num_special_tokens_to_add(is_pair)
seq_length = compute_effective_axis_dimension(
seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add
)
# Generate dummy inputs according to compute batch and sequence
dummy_input = [" ".join([preprocessor.unk_token]) * seq_length] * batch_size
if self.task == "multiple-choice":
# If dynamic axis (-1) we forward with a fixed dimension of 4 candidate answers to avoid optimizations
# made by ONNX
num_choices = compute_effective_axis_dimension(
num_choices, fixed_dimension=OnnxConfig.default_fixed_num_choices, num_token_to_add=0
)
dummy_input = dummy_input * num_choices
# The shape of the tokenized inputs values is [batch_size * num_choices, seq_length]
tokenized_input = preprocessor(dummy_input, text_pair=dummy_input)
# Unflatten the tokenized inputs values expanding it to the shape [batch_size, num_choices, seq_length]
for k, v in tokenized_input.items():
tokenized_input[k] = [v[i : i + num_choices] for i in range(0, len(v), num_choices)]
return dict(tokenized_input.convert_to_tensors(tensor_type=framework))
return dict(preprocessor(dummy_input, return_tensors=framework))
elif isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "pixel_values":
# If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX
batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch)
dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width)
return dict(preprocessor(images=dummy_input, return_tensors=framework))
else:
raise ValueError(
"Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor."
)
def patch_ops(self):
for spec in self._patching_specs:
custom_op = spec.custom_op if spec.op_wrapper is None else spec.op_wrapper(spec.custom_op)
setattr(spec.o, spec.name, custom_op)
def restore_ops(self):
for spec in self._patching_specs:
orig_op = spec.orig_op if spec.op_wrapper is None else spec.op_wrapper(spec.orig_op)
setattr(spec.o, spec.name, orig_op)
@classmethod
def flatten_output_collection_property(cls, name: str, field: Iterable[Any]) -> Dict[str, Any]:
"""
Flatten any potential nested structure expanding the name of the field with the index of the element within the
structure.
Args:
name: The name of the nested structure
field: The structure to, potentially, be flattened
Returns:
(Dict[str, Any]): Outputs with flattened structure and key mapping this new structure.
"""
from itertools import chain
return {f"{name}.{idx}": item for idx, item in enumerate(chain.from_iterable(field))}

Copy link
Member

@lewtun lewtun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for this new model @WaterKnight1998 and welcome to the 🤗 Transformers community!

As suggested by @chainyo, you'll need to override the function that generates dummy data. I also left a nit regarding one of the imports.

src/transformers/models/donut/configuration_donut_swin.py Outdated Show resolved Hide resolved
@WaterKnight1998
Copy link
Author

@chainyo @lewtun Relative imports fixed and added also the function to generate dummy functions. But when I convert the model into ONNX like this:

import transformers
from pathlib import Path


from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

from transformers.onnx import export
from transformers import AutoConfig
from transformers.models.donut import *

onnx_config = AutoConfig.from_pretrained("./swin")
onnx_config = DonutSwinOnnxConfig(onnx_config)

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
onnx_inputs, onnx_outputs = export(processor, model.encoder, onnx_config, onnx_config.default_onnx_opset, Path("model.onnx"))

I get the following warnings:

/home/david/.local/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if width % self.patch_size[1] != 0:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height % self.patch_size[0] != 0:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min(input_resolution) <= self.window_size:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:147: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = math.floor(windows.shape[0] / (height * width / window_size / window_size))
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  windows = windows.view(batch_size, height // window_size, width // window_size, window_size, window_size, -1)
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if was_padded:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  should_pad = (height % 2 == 1) or (width % 2 == 1)
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if should_pad:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

Is it ok?

@chainyo
Copy link
Contributor

chainyo commented Oct 10, 2022

Is it ok?

Hi @WaterKnight1998,
Do you get onnx files locally when you export the model?
Did you try to load the file with https://netron.app ?
Could you try to load an InferenceSession with Optimum or Onnx and use the model to see if it works?

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 10, 2022

Hi @WaterKnight1998, Do you get onnx files locally when you export the model?

Yes, I get the files

Did you try to load the file with https://netron.app ?

Yes, model loaded

Could you try to load an InferenceSession with Optimum or Onnx and use the model to see if it works?

I am testing:

from transformers.onnx import validate_model_outputs

validate_model_outputs(
    onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation
)

But python process is killed here in my computer: https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/convert.py#L392

Maybe too big for CPU?

@WaterKnight1998
Copy link
Author

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun ? Or is it OK?

@WaterKnight1998 WaterKnight1998 requested review from lewtun and chainyo and removed request for lewtun and chainyo October 10, 2022 15:17
@chainyo
Copy link
Contributor

chainyo commented Oct 11, 2022

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun? Or is it OK?

I didn't think about this but do you have enough RAM locally? Imagine the model is 20Gb you need the double to convert one model (~40Gb) because scripts need to load both models simultaneously.

The error I see on Databricks is about absolute tolerance, which is 1e-5` by default. There are two possibilities:

  • You selected the wrong --feature in your conversion command (maybe try something other than the default one)
  • You need to pass the argument --atol to your conversion command with the proper value even if 0.052 seems too much IMO (never go with more than 1e-3).

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 11, 2022

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun? Or is it OK?

I didn't think about this but do you have enough RAM locally? Imagine the model is 20Gb you need the double to convert one model (~40Gb) because scripts need to load both models simultaneously.

Good point, I just have 32GB of RAM locally, probably this.

The error I see on Databricks is about absolute tolerance, which is 1e-5` by default. There are two possibilities:

  • You selected the wrong --feature in your conversion command (maybe try something other than the default one)

I tested with this:

import transformers
from pathlib import Path


from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

from transformers.onnx import export
from transformers import AutoConfig
from transformers.models.donut import *

onnx_config = AutoConfig.from_pretrained("./swin")
onnx_config = DonutSwinOnnxConfig(onnx_config)

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
onnx_inputs, onnx_outputs = export(processor, model.encoder, onnx_config, onnx_config.default_onnx_opset, Path("model.onnx"))

from transformers.onnx import validate_model_outputs

validate_model_outputs(
    onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation
)
  • You need to pass the argument --atol to your conversion command with the proper value even if 0.052 seems too much IMO (never go with more than 1e-3).

In my config it is set to:

@property
    def atol_for_validation(self) -> float:
        return 1e-4

Should I test with 1e-3? But I am getting 0.05

I don't get why difference is too bight, maybe the warnings that I mentioned in other comment?

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if width % self.patch_size[1] != 0:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height % self.patch_size[0] != 0:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min(input_resolution) <= self.window_size:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:147: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = math.floor(windows.shape[0] / (height * width / window_size / window_size))
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  windows = windows.view(batch_size, height // window_size, width // window_size, window_size, window_size, -1)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if was_padded:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  should_pad = (height % 2 == 1) or (width % 2 == 1)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

@WaterKnight1998
Copy link
Author

Hi again @chainyo & @lewtun I tested validate_model_outputs in different setups:

  • Nvidia T4: 0.01 difference
  • Nvidia V100: 0.06 difference
  • CPU: 16 Cores & 56GB RAM: 0.04 difference

I don't know where is the problem. What can I look at?

@chainyo
Copy link
Contributor

chainyo commented Oct 11, 2022

I don't know where is the problem. What can I look at?

I think it just means that it's a bit random. I don't think it's linked to the hardware, test to check the atol like 10k times per hardware.

IMO it seems evident that atol=1e-2 could do the trick, but it looks terrible to accept atol > 1e-3.

To return to the warning, you had earlier while converting the model: did you check if all layers are implemented in ONNX?

@lewtun
Copy link
Member

lewtun commented Oct 11, 2022

Hey @WaterKnight1998 I recently implemented a fix in #19475 that was causing all the Swin models to have incorrect ONNX graphs. Could you first try rebasing on main and checking the tolerance again?

Added document question answering task to onnx features.


Adding the necessary changes to the donut module init.


Black formatting.


Imports are now relative.


Added a function to generate dummy inputs for DonutSwin tracing.


Black formatting.


Reordering imports.


Sorting imports.
@WaterKnight1998
Copy link
Author

Hey @WaterKnight1998 I recently implemented a fix in #19475 that was causing all the Swin models to have incorrect ONNX graphs. Could you first try rebasing on main and checking the tolerance again?

Hi @lewtun If if you in the PR i rebased and tested again, I am seeing the same issue:

ValueError                                Traceback (most recent call last)
<command-489655835555726> in <module>
      1 from transformers.onnx import validate_model_outputs
----> 2 validate_model_outputs(
      3     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
      4 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-f0e538e7-c99a-4698-9d4a-c04070b5c780/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    453             bad_indices = np.logical_not(np.isclose(ref_value, ort_value, atol=atol))
    454             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 455             raise ValueError(
    456                 "Outputs values doesn't match between reference model and ONNX exported model: "
    457                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))} for "

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.06693840026855469 for [ -2.359991    4.654682  -14.478863  ...   5.7127304   1.8854475
   0.7024307] vs [ -2.3598232   4.65485   -14.47826   ...   5.712929    1.8853188
   0.7022476]

@WaterKnight1998
Copy link
Author

Hi again, @lewtun & @chainyo I have checked this implementation and original Swin Transformer, the only difference is that normalization layer is not present. Maybe that's the reason?

@lewtun
Copy link
Member

lewtun commented Oct 13, 2022

Hi again, @lewtun & @chainyo I have checked this implementation and original Swin Transformer, the only difference is that normalization layer is not present. Maybe that's the reason?

Thanks for that insight @WaterKnight1998, although I'd be surprised if that's the source of the issue. I'll take a closer look at the dummy data generation ASAP

@lewtun
Copy link
Member

lewtun commented Oct 14, 2022

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx

My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 17, 2022

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx

My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

Hi @lewtun I tested this but this is not working owing to the tollerance issue. In addition, maybe some users just want to export the encoder part. adding @NielsRogge as he implemeted this in #18488

@BakingBrains
Copy link
Contributor

BakingBrains commented Oct 17, 2022

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx

My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

@lewtun While converting facing output value error (for the same command mentioned above)

Validating ONNX model...
	-[✓] ONNX model output names match reference model ({'last_hidden_state'})
	- Validating ONNX Model output "last_hidden_state":
		-[✓] (3, 1200, 1024) matches (3, 1200, 1024)
		-[x] values not close enough (atol: 1e-05)
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 113, in main
    args.atol if args.atol else encoder_onnx_config.atol_for_validation,
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/convert.py", line 456, in validate_model_outputs
    "Outputs values doesn't match between reference model and ONNX exported model: "
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0018157958984375 for [  1.5980988   0.5988426 -14.8206215 ...  -5.1114273   4.5024166
   2.8833218] vs [  1.5982218    0.59886694 -14.820812   ...  -5.1115417    4.502474
   2.883381  ]

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

@lewtun @WaterKnight1998 Any suggestions here ( I can share the Colab if required).

Thanks and Regards.

@WaterKnight1998
Copy link
Author

WaterKnight1998 commented Oct 17, 2022

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

@BakingBrains Using the code from my PR to do the encoder conversion?

@BakingBrains
Copy link
Contributor

BakingBrains commented Oct 19, 2022

@lewtun and @WaterKnight1998 any updates on the decoder? I am able to convert the decoder model. Not sure if that's the right method. (but the output shape from Donut decoder and ONNX decoder is same)

@WaterKnight1998
Copy link
Author

Hi, @lewtun @chainyo @BakingBrains any news on this? I need this to get the model into production :(

@WaterKnight1998
Copy link
Author

@sgugger could you help us? We are looking forward for this feature 🙂

@lewtun
Copy link
Member

lewtun commented Oct 28, 2022

Hey @WaterKnight1998 I'm taking a look at this, but it's turning out to be tricky to figure out why where the discrepancy arises with the ONNX graph vs PyTorch model.

@WaterKnight1998
Copy link
Author

Hey @WaterKnight1998 I'm taking a look at this, but it's turning out to be tricky to figure out why where the discrepancy arises with the ONNX graph vs PyTorch model.

Thank you very much for looking at it 😊

@lewtun
Copy link
Member

lewtun commented Oct 28, 2022

FYI if you need a temporary workaround and are willing to tolerate some error on the decoder, you can export one of the donut checkpoints on the main branch with:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 3e-3

This will produce two ONNX files (encoder_model.onnx and decoder_onnx.model) that you can then run inference with.

@lewtun
Copy link
Member

lewtun commented Oct 28, 2022

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

Good question @BakingBrains ! As of now, you'll have to roll your own generation loop with onnxruntime. An alternative would be to implement an ORTModelForVisionSeq2Seq in optimum, similar to how @mht-sharma is doing this for Whisper: https://github.com/huggingface/optimum/pull/420/files#diff-77c4bfa5fbc9262eda15bbbc01d9796a0daa33e6725ca41e1cfe600a702d0bfc

@BakingBrains
Copy link
Contributor

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

Good question @BakingBrains ! As of now, you'll have to roll your own generation loop with onnxruntime. An alternative would be to implement an ORTModelForVisionSeq2Seq in optimum, similar to how @mht-sharma is doing this for Whisper: https://github.com/huggingface/optimum/pull/420/files#diff-77c4bfa5fbc9262eda15bbbc01d9796a0daa33e6725ca41e1cfe600a702d0bfc

Thank you @lewtun. Got it.

@WaterKnight1998
Copy link
Author

FYI if you need a temporary workaround and are willing to tolerate some error on the decoder, you can export one of the donut checkpoints on the main branch with:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 3e-3

This will produce two ONNX files (encoder_model.onnx and decoder_onnx.model) that you can then run inference with.

Ok, thank you very much. I hope you find a solution and we can merge this branch.

@lewtun
Copy link
Member

lewtun commented Oct 31, 2022

I've created an issue to track the issue with specifically exporting Donut checkpoints: #19983

@WaterKnight1998 can you please share some code snippets on how you currently use the DonutSwin models for document QA and image classification? If I'm not mistaken, inference with these models is only supported via the VisionEncoderDecoder model, so once the above issue is resolved you should be able to use the export without needing the new tasks included in this PR

@WaterKnight1998
Copy link
Author

I've created an issue to track the issue with specifically exporting Donut checkpoints: #19983

@WaterKnight1998 can you please share some code snippets on how you currently use the DonutSwin models for document QA and image classification? If I'm not mistaken, inference with these models is only supported via the VisionEncoderDecoder model, so once the above issue is resolved you should be able to use the export without needing the new tasks included in this PR

Yes, you are right, maybe we can remove those tasks. However, I think it will be good to allow users to export the encoder independently. Maybe some wants to re-use it for a different model or architecture

@github-actions
Copy link

github-actions bot commented Dec 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Dec 10, 2022
@WaterKnight1998
Copy link
Author

@lewtun reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ONNXConfig: Add a configuration for all available models
5 participants