Hashes to identify input, outputs and output_annotations data entries #155

oplatek · 2024-11-27T10:41:43Z

Our dataset management can be illustrated based on the dependencies how the entries are generated.

input(dataset, split) -> NLG process -> output(NLG_system_id)  \
   output(NLG_system_id) -> ANNOTATION_PROCESS -> annotations_of_output(campaign_details, ...)

Since many properties could identify input, output, and output_annotations, I think it is best to use hashes to identify inputs, outputs, and list_of_example_annotations.

I image that each data entry will have a hash

input
  - input_idx  # determining dataset, split and particular example, how to example was preprocess/rendered by factgenie etc...

output
  - input_idx  # reference to the exact input which was used for generation
  - output_idx  # uniquely identifying the output

annotations_list
  - output_idx  # uniquely identifying which output was annotated
  - annotations_idx  # uniquely identifyiing the annotation list

The text was updated successfully, but these errors were encountered:

oplatek added the enhancement New feature or request label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hashes to identify input, outputs and output_annotations data entries #155

Hashes to identify input, outputs and output_annotations data entries #155

oplatek commented Nov 27, 2024

Hashes to identify input, outputs and output_annotations data entries #155

Hashes to identify input, outputs and output_annotations data entries #155

Comments

oplatek commented Nov 27, 2024