Adds DonutSwin to models exportable with ONNX #19401

WaterKnight1998 · 2022-10-07T09:13:02Z

What does this PR do?

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@lewtun & @chainyo for ONNX and @NielsRogge for Donut and Document Question Answering.

HuggingFaceDocBuilderDev · 2022-10-07T09:37:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

chainyo

Hi @WaterKnight1998,

Thanks for your PR. It looks clean.

Nice catch for the model-type variable that could be tricky to find: https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/blob/main/config.json#L138

First DocumentQuestionAnswering model added. It's pretty cool!

WaterKnight1998 · 2022-10-07T11:58:24Z

Hi @WaterKnight1998,

Thanks for your PR. It looks clean.

Nice catch for the model-type variable that could be tricky to find: https://huggingface.co/naver-clova-ix/donut-base-finetuned-docvqa/blob/main/config.json#L138

First DocumentQuestionAnswering model added. It's pretty cool!

I don't see the comment. Do I need to solve anything?

However, for testing locally I was using next code but I can't export the model :(

I exported just encoder like this

from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

Then trying to convert to onnx I get:

python -m transformers.onnx --model=./swin onnx/
Local PyTorch model found.
Framework not requested. Using torch to export to ONNX.
/home/david/.local/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Using framework PyTorch: 1.12.1+cu116
Traceback (most recent call last):
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/__main__.py", line 115, in <module>
    main()
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/__main__.py", line 97, in main
    onnx_inputs, onnx_outputs = export(
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/convert.py", line 337, in export
    return export_pytorch(preprocessor, model, config, opset, output, tokenizer=tokenizer, device=device)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/convert.py", line 144, in export_pytorch
    model_inputs = config.generate_dummy_inputs(preprocessor, framework=TensorType.PYTORCH)
  File "/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/onnx/config.py", line 348, in generate_dummy_inputs
    raise ValueError(
ValueError: Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor.

Do I need to add more code?

chainyo · 2022-10-07T12:52:30Z

Do I need to add more code?

Yes, it would help if you overcharged the generate_dummy_inputs() function. Like the LayoutLMv3 model, you need to define the process as a dummy input. ONNX conversion models use one batch (even random dummy data) to follow the data flow through the graph layers.

Check this here:

transformers/src/transformers/models/layoutlmv3/configuration_layoutlmv3.py

Lines 227 to 294 in bc21aac

    
               def generate_dummy_inputs( 
        
                   self, 
        
                   processor: "ProcessorMixin", 
        
                   batch_size: int = -1, 
        
                   seq_length: int = -1, 
        
                   is_pair: bool = False, 
        
                   framework: Optional["TensorType"] = None, 
        
                   num_channels: int = 3, 
        
                   image_width: int = 40, 
        
                   image_height: int = 40, 
        
               ) -> Mapping[str, Any]: 
        
                   """ 
        
                   Generate inputs to provide to the ONNX exporter for the specific framework 
        
                   Args: 
        
                       processor ([`ProcessorMixin`]): 
        
                           The processor associated with this model configuration. 
        
                       batch_size (`int`, *optional*, defaults to -1): 
        
                           The batch size to export the model for (-1 means dynamic axis). 
        
                       seq_length (`int`, *optional*, defaults to -1): 
        
                           The sequence length to export the model for (-1 means dynamic axis). 
        
                       is_pair (`bool`, *optional*, defaults to `False`): 
        
                           Indicate if the input is a pair (sentence 1, sentence 2). 
        
                       framework (`TensorType`, *optional*, defaults to `None`): 
        
                           The framework (PyTorch or TensorFlow) that the processor will generate tensors for. 
        
                       num_channels (`int`, *optional*, defaults to 3): 
        
                           The number of channels of the generated images. 
        
                       image_width (`int`, *optional*, defaults to 40): 
        
                           The width of the generated images. 
        
                       image_height (`int`, *optional*, defaults to 40): 
        
                           The height of the generated images. 
        
                   Returns: 
        
                       Mapping[str, Any]: holding the kwargs to provide to the model's forward function 
        
                   """ 
        
                   # A dummy image is used so OCR should not be applied 
        
                   setattr(processor.feature_extractor, "apply_ocr", False) 
        
                   # If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX 
        
                   batch_size = compute_effective_axis_dimension( 
        
                       batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0 
        
                   ) 
        
                   # If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX 
        
                   token_to_add = processor.tokenizer.num_special_tokens_to_add(is_pair) 
        
                   seq_length = compute_effective_axis_dimension( 
        
                       seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add 
        
                   ) 
        
                   # Generate dummy inputs according to compute batch and sequence 
        
                   dummy_text = [[" ".join([processor.tokenizer.unk_token]) * seq_length]] * batch_size 
        
                   # Generate dummy bounding boxes 
        
                   dummy_bboxes = [[[48, 84, 73, 128]]] * batch_size 
        
                   # If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX 
        
                   # batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch) 
        
                   dummy_image = self._generate_dummy_images(batch_size, num_channels, image_height, image_width) 
        
                   inputs = dict( 
        
                       processor( 
        
                           dummy_image, 
        
                           text=dummy_text, 
        
                           boxes=dummy_bboxes, 
        
                           return_tensors=framework, 
        
                       ) 
        
                   ) 
        
                   return inputs

This can help too, it's the base generate_dummy_inputs() function :

transformers/src/transformers/onnx/config.py

Lines 264 to 378 in bc21aac

    
               def generate_dummy_inputs( 
        
                   self, 
        
                   preprocessor: Union["PreTrainedTokenizerBase", "FeatureExtractionMixin"], 
        
                   batch_size: int = -1, 
        
                   seq_length: int = -1, 
        
                   num_choices: int = -1, 
        
                   is_pair: bool = False, 
        
                   framework: Optional[TensorType] = None, 
        
                   num_channels: int = 3, 
        
                   image_width: int = 40, 
        
                   image_height: int = 40, 
        
                   tokenizer: "PreTrainedTokenizerBase" = None, 
        
               ) -> Mapping[str, Any]: 
        
                   """ 
        
                   Generate inputs to provide to the ONNX exporter for the specific framework 
        
                   Args: 
        
                       preprocessor: ([`PreTrainedTokenizerBase`] or [`FeatureExtractionMixin`]): 
        
                           The preprocessor associated with this model configuration. 
        
                       batch_size (`int`, *optional*, defaults to -1): 
        
                           The batch size to export the model for (-1 means dynamic axis). 
        
                       num_choices (`int`, *optional*, defaults to -1): 
        
                           The number of candidate answers provided for multiple choice task (-1 means dynamic axis). 
        
                       seq_length (`int`, *optional*, defaults to -1): 
        
                           The sequence length to export the model for (-1 means dynamic axis). 
        
                       is_pair (`bool`, *optional*, defaults to `False`): 
        
                           Indicate if the input is a pair (sentence 1, sentence 2) 
        
                       framework (`TensorType`, *optional*, defaults to `None`): 
        
                           The framework (PyTorch or TensorFlow) that the tokenizer will generate tensors for. 
        
                       num_channels (`int`, *optional*, defaults to 3): 
        
                           The number of channels of the generated images. 
        
                       image_width (`int`, *optional*, defaults to 40): 
        
                           The width of the generated images. 
        
                       image_height (`int`, *optional*, defaults to 40): 
        
                           The height of the generated images. 
        
                   Returns: 
        
                       Mapping[str, Tensor] holding the kwargs to provide to the model's forward function 
        
                   """ 
        
                   from ..feature_extraction_utils import FeatureExtractionMixin 
        
                   from ..tokenization_utils_base import PreTrainedTokenizerBase 
        
                   if isinstance(preprocessor, PreTrainedTokenizerBase) and tokenizer is not None: 
        
                       raise ValueError("You cannot provide both a tokenizer and a preprocessor to generate dummy inputs.") 
        
                   if tokenizer is not None: 
        
                       warnings.warn( 
        
                           "The `tokenizer` argument is deprecated and will be removed in version 5 of Transformers. Use" 
        
                           " `preprocessor` instead.", 
        
                           FutureWarning, 
        
                       ) 
        
                       logger.warning("Overwriting the `preprocessor` argument with `tokenizer` to generate dummmy inputs.") 
        
                       preprocessor = tokenizer 
        
                   if isinstance(preprocessor, PreTrainedTokenizerBase): 
        
                       # If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX 
        
                       batch_size = compute_effective_axis_dimension( 
        
                           batch_size, fixed_dimension=OnnxConfig.default_fixed_batch, num_token_to_add=0 
        
                       ) 
        
                       # If dynamic axis (-1) we forward with a fixed dimension of 8 tokens to avoid optimizations made by ONNX 
        
                       token_to_add = preprocessor.num_special_tokens_to_add(is_pair) 
        
                       seq_length = compute_effective_axis_dimension( 
        
                           seq_length, fixed_dimension=OnnxConfig.default_fixed_sequence, num_token_to_add=token_to_add 
        
                       ) 
        
                       # Generate dummy inputs according to compute batch and sequence 
        
                       dummy_input = [" ".join([preprocessor.unk_token]) * seq_length] * batch_size 
        
                       if self.task == "multiple-choice": 
        
                           # If dynamic axis (-1) we forward with a fixed dimension of 4 candidate answers to avoid optimizations 
        
                           # made by ONNX 
        
                           num_choices = compute_effective_axis_dimension( 
        
                               num_choices, fixed_dimension=OnnxConfig.default_fixed_num_choices, num_token_to_add=0 
        
                           ) 
        
                           dummy_input = dummy_input * num_choices 
        
                           # The shape of the tokenized inputs values is [batch_size * num_choices, seq_length] 
        
                           tokenized_input = preprocessor(dummy_input, text_pair=dummy_input) 
        
                           # Unflatten the tokenized inputs values expanding it to the shape [batch_size, num_choices, seq_length] 
        
                           for k, v in tokenized_input.items(): 
        
                               tokenized_input[k] = [v[i : i + num_choices] for i in range(0, len(v), num_choices)] 
        
                           return dict(tokenized_input.convert_to_tensors(tensor_type=framework)) 
        
                       return dict(preprocessor(dummy_input, return_tensors=framework)) 
        
                   elif isinstance(preprocessor, FeatureExtractionMixin) and preprocessor.model_input_names[0] == "pixel_values": 
        
                       # If dynamic axis (-1) we forward with a fixed dimension of 2 samples to avoid optimizations made by ONNX 
        
                       batch_size = compute_effective_axis_dimension(batch_size, fixed_dimension=OnnxConfig.default_fixed_batch) 
        
                       dummy_input = self._generate_dummy_images(batch_size, num_channels, image_height, image_width) 
        
                       return dict(preprocessor(images=dummy_input, return_tensors=framework)) 
        
                   else: 
        
                       raise ValueError( 
        
                           "Unable to generate dummy inputs for the model. Please provide a tokenizer or a preprocessor." 
        
                       ) 
        
               def patch_ops(self): 
        
                   for spec in self._patching_specs: 
        
                       custom_op = spec.custom_op if spec.op_wrapper is None else spec.op_wrapper(spec.custom_op) 
        
                       setattr(spec.o, spec.name, custom_op) 
        
               def restore_ops(self): 
        
                   for spec in self._patching_specs: 
        
                       orig_op = spec.orig_op if spec.op_wrapper is None else spec.op_wrapper(spec.orig_op) 
        
                       setattr(spec.o, spec.name, orig_op) 
        
               @classmethod 
        
               def flatten_output_collection_property(cls, name: str, field: Iterable[Any]) -> Dict[str, Any]: 
        
                   """ 
        
                   Flatten any potential nested structure expanding the name of the field with the index of the element within the 
        
                   structure. 
        
                   Args: 
        
                       name: The name of the nested structure 
        
                       field: The structure to, potentially, be flattened 
        
                   Returns: 
        
                       (Dict[str, Any]): Outputs with flattened structure and key mapping this new structure. 
        
                   """ 
        
                   from itertools import chain 
        
                   return {f"{name}.{idx}": item for idx, item in enumerate(chain.from_iterable(field))}

lewtun

Thanks for adding support for this new model @WaterKnight1998 and welcome to the 🤗 Transformers community!

As suggested by @chainyo, you'll need to override the function that generates dummy data. I also left a nit regarding one of the imports.

src/transformers/models/donut/configuration_donut_swin.py

WaterKnight1998 · 2022-10-10T07:42:36Z

@chainyo @lewtun Relative imports fixed and added also the function to generate dummy functions. But when I convert the model into ONNX like this:

import transformers
from pathlib import Path


from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

from transformers.onnx import export
from transformers import AutoConfig
from transformers.models.donut import *

onnx_config = AutoConfig.from_pretrained("./swin")
onnx_config = DonutSwinOnnxConfig(onnx_config)

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
onnx_inputs, onnx_outputs = export(processor, model.encoder, onnx_config, onnx_config.default_onnx_opset, Path("model.onnx"))

I get the following warnings:

/home/david/.local/lib/python3.10/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if width % self.patch_size[1] != 0:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height % self.patch_size[0] != 0:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min(input_resolution) <= self.window_size:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:147: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = math.floor(windows.shape[0] / (height * width / window_size / window_size))
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  windows = windows.view(batch_size, height // window_size, width // window_size, window_size, window_size, -1)
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if was_padded:
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  should_pad = (height % 2 == 1) or (width % 2 == 1)
/home/david/micromamba/envs/huggingface/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if should_pad:
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of prim::Constant type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.

Is it ok?

chainyo · 2022-10-10T09:42:25Z

Is it ok?

Hi @WaterKnight1998,
Do you get onnx files locally when you export the model?
Did you try to load the file with https://netron.app ?
Could you try to load an InferenceSession with Optimum or Onnx and use the model to see if it works?

WaterKnight1998 · 2022-10-10T09:49:08Z

Hi @WaterKnight1998, Do you get onnx files locally when you export the model?

Yes, I get the files

Did you try to load the file with https://netron.app ?

Yes, model loaded

Could you try to load an InferenceSession with Optimum or Onnx and use the model to see if it works?

I am testing:

from transformers.onnx import validate_model_outputs

validate_model_outputs(
    onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation
)

But python process is killed here in my computer: https://github.com/huggingface/transformers/blob/main/src/transformers/onnx/convert.py#L392

Maybe too big for CPU?

WaterKnight1998 · 2022-10-10T12:16:46Z

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun ? Or is it OK?

chainyo · 2022-10-11T07:18:29Z

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun? Or is it OK?

I didn't think about this but do you have enough RAM locally? Imagine the model is 20Gb you need the double to convert one model (~40Gb) because scripts need to load both models simultaneously.

The error I see on Databricks is about absolute tolerance, which is 1e-5` by default. There are two possibilities:

You selected the wrong --feature in your conversion command (maybe try something other than the default one)
You need to pass the argument --atol to your conversion command with the proper value even if 0.052 seems too much IMO (never go with more than 1e-3).

WaterKnight1998 · 2022-10-11T07:28:20Z

Hi, I tested in Databricks and got this error:


ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<command-489655835555725> in <module>
     32 
     33 from transformers.onnx import validate_model_outputs
---> 34 validate_model_outputs(
     35     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
     36 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    440         if not np.allclose(ref_value, ort_value, atol=atol):
    441             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 442             raise ValueError(
    443                 "Outputs values doesn't match between reference model and ONNX exported model: "
    444                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))}"

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got a max absolute difference of: 0.05213117599487305

Maybe I need to update anything @chainyo & @lewtun? Or is it OK?

I didn't think about this but do you have enough RAM locally? Imagine the model is 20Gb you need the double to convert one model (~40Gb) because scripts need to load both models simultaneously.

Good point, I just have 32GB of RAM locally, probably this.

The error I see on Databricks is about absolute tolerance, which is 1e-5` by default. There are two possibilities:

You selected the wrong --feature in your conversion command (maybe try something other than the default one)

I tested with this:

import transformers
from pathlib import Path


from transformers import VisionEncoderDecoderModel
model = VisionEncoderDecoderModel.from_pretrained("naver-clova-ix/donut-base")
model.encoder.save_pretrained("./swin")

from transformers.onnx import export
from transformers import AutoConfig
from transformers.models.donut import *

onnx_config = AutoConfig.from_pretrained("./swin")
onnx_config = DonutSwinOnnxConfig(onnx_config)

processor = DonutProcessor.from_pretrained("naver-clova-ix/donut-base")
onnx_inputs, onnx_outputs = export(processor, model.encoder, onnx_config, onnx_config.default_onnx_opset, Path("model.onnx"))

from transformers.onnx import validate_model_outputs

validate_model_outputs(
    onnx_config, tokenizer, base_model, onnx_path, onnx_outputs, onnx_config.atol_for_validation
)

You need to pass the argument --atol to your conversion command with the proper value even if 0.052 seems too much IMO (never go with more than 1e-3).

In my config it is set to:

@property
    def atol_for_validation(self) -> float:
        return 1e-4

Should I test with 1e-3? But I am getting 0.05

I don't get why difference is too bight, maybe the warnings that I mentioned in other comment?

/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:230: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if num_channels != self.num_channels:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:220: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if width % self.patch_size[1] != 0:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:223: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if height % self.patch_size[0] != 0:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:536: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min(input_resolution) <= self.window_size:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:136: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size, height // window_size, window_size, width // window_size, window_size, num_channels
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:147: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_size = math.floor(windows.shape[0] / (height * width / window_size / window_size))
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:148: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  windows = windows.view(batch_size, height // window_size, width // window_size, window_size, window_size, -1)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:622: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  was_padded = pad_values[3] > 0 or pad_values[5] > 0
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:623: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if was_padded:
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:411: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_size // mask_shape, mask_shape, self.num_attention_heads, dim, dim
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:682: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  height_downsampled, width_downsampled = (height + 1) // 2, (width + 1) // 2
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:266: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  should_pad = (height % 2 == 1) or (width % 2 == 1)
/local_disk0/.ephemeral_nfs/envs/pythonEnv-b455b6d8-06c3-4a9e-9af6-0fd82d764878/lib/python3.8/site-packages/transformers/models/donut/modeling_donut_swin.py:267: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

WaterKnight1998 · 2022-10-11T11:30:14Z

Hi again @chainyo & @lewtun I tested validate_model_outputs in different setups:

Nvidia T4: 0.01 difference
Nvidia V100: 0.06 difference
CPU: 16 Cores & 56GB RAM: 0.04 difference

I don't know where is the problem. What can I look at?

chainyo · 2022-10-11T12:10:08Z

I don't know where is the problem. What can I look at?

I think it just means that it's a bit random. I don't think it's linked to the hardware, test to check the atol like 10k times per hardware.

IMO it seems evident that atol=1e-2 could do the trick, but it looks terrible to accept atol > 1e-3.

To return to the warning, you had earlier while converting the model: did you check if all layers are implemented in ONNX?

lewtun · 2022-10-11T18:43:09Z

Hey @WaterKnight1998 I recently implemented a fix in #19475 that was causing all the Swin models to have incorrect ONNX graphs. Could you first try rebasing on main and checking the tolerance again?

Added document question answering task to onnx features. Adding the necessary changes to the donut module init. Black formatting. Imports are now relative. Added a function to generate dummy inputs for DonutSwin tracing. Black formatting. Reordering imports. Sorting imports.

WaterKnight1998 · 2022-10-12T13:15:58Z

Hey @WaterKnight1998 I recently implemented a fix in #19475 that was causing all the Swin models to have incorrect ONNX graphs. Could you first try rebasing on main and checking the tolerance again?

Hi @lewtun If if you in the PR i rebased and tested again, I am seeing the same issue:

ValueError                                Traceback (most recent call last)
<command-489655835555726> in <module>
      1 from transformers.onnx import validate_model_outputs
----> 2 validate_model_outputs(
      3     onnx_config, processor, model.encoder, Path("model.onnx"), onnx_outputs, onnx_config.atol_for_validation
      4 )

/local_disk0/.ephemeral_nfs/envs/pythonEnv-f0e538e7-c99a-4698-9d4a-c04070b5c780/lib/python3.8/site-packages/transformers/onnx/convert.py in validate_model_outputs(config, preprocessor, reference_model, onnx_model, onnx_named_outputs, atol, tokenizer)
    453             bad_indices = np.logical_not(np.isclose(ref_value, ort_value, atol=atol))
    454             logger.info(f"\t\t-[x] values not close enough (atol: {atol})")
--> 455             raise ValueError(
    456                 "Outputs values doesn't match between reference model and ONNX exported model: "
    457                 f"Got max absolute difference of: {np.amax(np.abs(ref_value - ort_value))} for "

ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.06693840026855469 for [ -2.359991    4.654682  -14.478863  ...   5.7127304   1.8854475
   0.7024307] vs [ -2.3598232   4.65485   -14.47826   ...   5.712929    1.8853188
   0.7022476]

WaterKnight1998 · 2022-10-13T12:02:41Z

Hi again, @lewtun & @chainyo I have checked this implementation and original Swin Transformer, the only difference is that normalization layer is not present. Maybe that's the reason?

lewtun · 2022-10-13T19:44:38Z

Hi again, @lewtun & @chainyo I have checked this implementation and original Swin Transformer, the only difference is that normalization layer is not present. Maybe that's the reason?

Thanks for that insight @WaterKnight1998, although I'd be surprised if that's the source of the issue. I'll take a closer look at the dummy data generation ASAP

lewtun · 2022-10-14T16:26:48Z

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx

My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

WaterKnight1998 · 2022-10-17T09:24:37Z

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx
My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

Hi @lewtun I tested this but this is not working owing to the tollerance issue. In addition, maybe some users just want to export the encoder part. adding @NielsRogge as he implemeted this in #18488

BakingBrains · 2022-10-17T13:34:49Z

Hi @WaterKnight1998 now that #19254 has been merged, can't you export the Donut checkpoints directly using this feature:
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx
My understanding is that Donut falls under the general class of vision encoder-decoder models, so a separate ONNX export might not be needed

@lewtun While converting facing output value error (for the same command mentioned above)

Validating ONNX model...
	-[✓] ONNX model output names match reference model ({'last_hidden_state'})
	- Validating ONNX Model output "last_hidden_state":
		-[✓] (3, 1200, 1024) matches (3, 1200, 1024)
		-[x] values not close enough (atol: 1e-05)
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 180, in <module>
    main()
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/__main__.py", line 113, in main
    args.atol if args.atol else encoder_onnx_config.atol_for_validation,
  File "/usr/local/lib/python3.7/dist-packages/transformers/onnx/convert.py", line 456, in validate_model_outputs
    "Outputs values doesn't match between reference model and ONNX exported model: "
ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.0018157958984375 for [  1.5980988   0.5988426 -14.8206215 ...  -5.1114273   4.5024166
   2.8833218] vs [  1.5982218    0.59886694 -14.820812   ...  -5.1115417    4.502474
   2.883381  ]

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

@lewtun @WaterKnight1998 Any suggestions here ( I can share the Colab if required).

Thanks and Regards.

WaterKnight1998 · 2022-10-17T15:26:36Z

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

@BakingBrains Using the code from my PR to do the encoder conversion?

BakingBrains · 2022-10-19T13:05:25Z

@lewtun and @WaterKnight1998 any updates on the decoder? I am able to convert the decoder model. Not sure if that's the right method. (but the output shape from Donut decoder and ONNX decoder is same)

WaterKnight1998 · 2022-10-26T15:25:48Z

Hi, @lewtun @chainyo @BakingBrains any news on this? I need this to get the model into production :(

WaterKnight1998 · 2022-10-28T16:29:50Z

@sgugger could you help us? We are looking forward for this feature 🙂

lewtun · 2022-10-28T18:07:02Z

Hey @WaterKnight1998 I'm taking a look at this, but it's turning out to be tricky to figure out why where the discrepancy arises with the ONNX graph vs PyTorch model.

WaterKnight1998 · 2022-10-28T19:22:33Z

Hey @WaterKnight1998 I'm taking a look at this, but it's turning out to be tricky to figure out why where the discrepancy arises with the ONNX graph vs PyTorch model.

Thank you very much for looking at it 😊

lewtun · 2022-10-28T20:12:15Z

FYI if you need a temporary workaround and are willing to tolerate some error on the decoder, you can export one of the donut checkpoints on the main branch with:

python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 3e-3

This will produce two ONNX files (encoder_model.onnx and decoder_onnx.model) that you can then run inference with.

lewtun · 2022-10-28T20:15:09Z

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

Good question @BakingBrains ! As of now, you'll have to roll your own generation loop with onnxruntime. An alternative would be to implement an ORTModelForVisionSeq2Seq in optimum, similar to how @mht-sharma is doing this for Whisper: https://github.com/huggingface/optimum/pull/420/files#diff-77c4bfa5fbc9262eda15bbbc01d9796a0daa33e6725ca41e1cfe600a702d0bfc

BakingBrains · 2022-10-29T04:12:07Z

But separately I am able to convert the encoder and decoder model to ONNX as well as verified the output shape, that went well. But I don't know how to implement model.generate() instead of model.run for the decoder part.

Good question @BakingBrains ! As of now, you'll have to roll your own generation loop with onnxruntime. An alternative would be to implement an ORTModelForVisionSeq2Seq in optimum, similar to how @mht-sharma is doing this for Whisper: https://github.com/huggingface/optimum/pull/420/files#diff-77c4bfa5fbc9262eda15bbbc01d9796a0daa33e6725ca41e1cfe600a702d0bfc

Thank you @lewtun. Got it.

WaterKnight1998 · 2022-10-31T09:11:39Z

FYI if you need a temporary workaround and are willing to tolerate some error on the decoder, you can export one of the donut checkpoints on the main branch with:
python -m transformers.onnx --model=naver-clova-ix/donut-base-finetuned-cord-v2 --feature=vision2seq-lm scratch/onnx --atol 3e-3
This will produce two ONNX files (encoder_model.onnx and decoder_onnx.model) that you can then run inference with.

Ok, thank you very much. I hope you find a solution and we can merge this branch.

lewtun · 2022-10-31T14:05:36Z

I've created an issue to track the issue with specifically exporting Donut checkpoints: #19983

@WaterKnight1998 can you please share some code snippets on how you currently use the DonutSwin models for document QA and image classification? If I'm not mistaken, inference with these models is only supported via the VisionEncoderDecoder model, so once the above issue is resolved you should be able to use the export without needing the new tasks included in this PR

WaterKnight1998 · 2022-11-07T16:47:58Z

I've created an issue to track the issue with specifically exporting Donut checkpoints: #19983

@WaterKnight1998 can you please share some code snippets on how you currently use the DonutSwin models for document QA and image classification? If I'm not mistaken, inference with these models is only supported via the VisionEncoderDecoder model, so once the above issue is resolved you should be able to use the export without needing the new tasks included in this PR

Yes, you are right, maybe we can remove those tasks. However, I think it will be good to allow users to export the encoder independently. Maybe some wants to re-use it for a different model or architecture

github-actions · 2022-12-02T15:02:24Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

WaterKnight1998 · 2022-12-13T10:09:17Z

@lewtun reopen

WaterKnight1998 mentioned this pull request Oct 7, 2022

ONNXConfig: Add a configuration for all available models #16308

Closed

WaterKnight1998 changed the title ~~Adds Donut to models exportable with ONNX~~ Adds DonutSwin to models exportable with ONNX Oct 7, 2022

chainyo reviewed Oct 7, 2022

View reviewed changes

lewtun reviewed Oct 7, 2022

View reviewed changes

src/transformers/models/donut/configuration_donut_swin.py Outdated Show resolved Hide resolved

WaterKnight1998 requested review from lewtun and chainyo and removed request for lewtun and chainyo October 10, 2022 15:17

WaterKnight1998 requested review from lewtun and removed request for chainyo October 11, 2022 07:41

WaterKnight1998 force-pushed the main branch from eda1c7b to c67798c Compare October 11, 2022 08:57

WaterKnight1998 mentioned this pull request Oct 11, 2022

[REIMPLEMETATION] Vision encoder decoder Onnx conversion #19476

Closed

5 tasks

WaterKnight1998 force-pushed the main branch from c67798c to c102731 Compare October 12, 2022 11:42

WaterKnight1998 mentioned this pull request Nov 7, 2022

Cannot export Donut models to ONNX #19983

Closed

4 tasks

github-actions bot closed this Dec 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds DonutSwin to models exportable with ONNX #19401

Adds DonutSwin to models exportable with ONNX #19401

WaterKnight1998 commented Oct 7, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 7, 2022

chainyo left a comment

WaterKnight1998 commented Oct 7, 2022

chainyo commented Oct 7, 2022 •

edited

Loading

lewtun left a comment

WaterKnight1998 commented Oct 10, 2022

chainyo commented Oct 10, 2022

WaterKnight1998 commented Oct 10, 2022 •

edited

Loading

WaterKnight1998 commented Oct 10, 2022

chainyo commented Oct 11, 2022 •

edited

Loading

WaterKnight1998 commented Oct 11, 2022 •

edited

Loading

WaterKnight1998 commented Oct 11, 2022

chainyo commented Oct 11, 2022 •

edited

Loading

lewtun commented Oct 11, 2022

WaterKnight1998 commented Oct 12, 2022

WaterKnight1998 commented Oct 13, 2022

lewtun commented Oct 13, 2022

lewtun commented Oct 14, 2022

WaterKnight1998 commented Oct 17, 2022 •

edited

Loading

BakingBrains commented Oct 17, 2022 •

edited

Loading

WaterKnight1998 commented Oct 17, 2022 •

edited

Loading

BakingBrains commented Oct 19, 2022 •

edited

Loading

WaterKnight1998 commented Oct 26, 2022

WaterKnight1998 commented Oct 28, 2022

lewtun commented Oct 28, 2022

WaterKnight1998 commented Oct 28, 2022

lewtun commented Oct 28, 2022

lewtun commented Oct 28, 2022

BakingBrains commented Oct 29, 2022

WaterKnight1998 commented Oct 31, 2022

lewtun commented Oct 31, 2022

WaterKnight1998 commented Nov 7, 2022

github-actions bot commented Dec 2, 2022

WaterKnight1998 commented Dec 13, 2022

Adds DonutSwin to models exportable with ONNX #19401

Adds DonutSwin to models exportable with ONNX #19401

Conversation

WaterKnight1998 commented Oct 7, 2022 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Oct 7, 2022

chainyo left a comment

Choose a reason for hiding this comment

WaterKnight1998 commented Oct 7, 2022

chainyo commented Oct 7, 2022 • edited Loading

lewtun left a comment

Choose a reason for hiding this comment

WaterKnight1998 commented Oct 10, 2022

chainyo commented Oct 10, 2022

WaterKnight1998 commented Oct 10, 2022 • edited Loading

WaterKnight1998 commented Oct 10, 2022

chainyo commented Oct 11, 2022 • edited Loading

WaterKnight1998 commented Oct 11, 2022 • edited Loading

WaterKnight1998 commented Oct 11, 2022

chainyo commented Oct 11, 2022 • edited Loading

lewtun commented Oct 11, 2022

WaterKnight1998 commented Oct 12, 2022

WaterKnight1998 commented Oct 13, 2022

lewtun commented Oct 13, 2022

lewtun commented Oct 14, 2022

WaterKnight1998 commented Oct 17, 2022 • edited Loading

BakingBrains commented Oct 17, 2022 • edited Loading

WaterKnight1998 commented Oct 17, 2022 • edited Loading

BakingBrains commented Oct 19, 2022 • edited Loading

WaterKnight1998 commented Oct 26, 2022

WaterKnight1998 commented Oct 28, 2022

lewtun commented Oct 28, 2022

WaterKnight1998 commented Oct 28, 2022

lewtun commented Oct 28, 2022

lewtun commented Oct 28, 2022

BakingBrains commented Oct 29, 2022

WaterKnight1998 commented Oct 31, 2022

lewtun commented Oct 31, 2022

WaterKnight1998 commented Nov 7, 2022

github-actions bot commented Dec 2, 2022

WaterKnight1998 commented Dec 13, 2022

WaterKnight1998 commented Oct 7, 2022 •

edited

Loading

chainyo commented Oct 7, 2022 •

edited

Loading

WaterKnight1998 commented Oct 10, 2022 •

edited

Loading

chainyo commented Oct 11, 2022 •

edited

Loading

WaterKnight1998 commented Oct 11, 2022 •

edited

Loading

chainyo commented Oct 11, 2022 •

edited

Loading

WaterKnight1998 commented Oct 17, 2022 •

edited

Loading

BakingBrains commented Oct 17, 2022 •

edited

Loading

WaterKnight1998 commented Oct 17, 2022 •

edited

Loading

BakingBrains commented Oct 19, 2022 •

edited

Loading