Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

OR_4243_Labelstudio_Change_prediction_upload_algorithm #15

Conversation

rostyslavhereha
Copy link

@rostyslavhereha rostyslavhereha commented Nov 2, 2023

PR fulfills these requirements

  • Commit message(s) and PR title follows the format [fix|feat|ci|chore|doc]: TICKET-ID: Short description of change made ex. fix: DEV-XXXX: Removed inconsistent code usage causing intermittent errors
  • Tests for the changes have been added/updated (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)
  • Best efforts were made to ensure docs/code are concise and coherent (checked for spelling/grammatical errors, commented out code, debug logs etc.)
  • Self-reviewed and ran all changes on a local instance (for bug fixes/features)

Change has impacts in these area(s)

(check all that apply)

  • Product design
  • Backend (Database)
  • Backend (API)
  • Frontend

Describe the reason for change

We figured out that 90% of the time during the prediction generation is spend during prediction upload to the label studio backend:

this is caused by posting each prediction individually to the backend
this is probably causing a separate update to the database as well
Please rewrite the algorithm to use bulk_create to significantly increase the data download speed.

What does this fix?

  • In PredictitonAPI, the create() method has been added, which parses the incoming request into individual Prediction objects and saves them using bulk_create().
  • The post_save() signal was changed to optimize work with the database
  • In AnnotationAPI the post() method has been added, which parses the incoming request into individual Annotation objects and saves them using transaction.atomic.

What is the new behavior?

Now on my test dataset with 1300 images, predictions are saved to the database in 7 seconds instead of 1 minute 30 seconds. And the same annotation count uploads in ~15 seconds instead of ~ 2 minute.

Does this PR introduce a breaking change?

(check only one)

  • Yes, and covered entirely by feature flag(s)
  • Yes, and covered partially by feature flag(s)
  • No
  • Not sure (briefly explain the situation below)

@rostyslavhereha rostyslavhereha changed the title Or 4243 labelstudio change prediction annotation upload algorithm OR_4243_Labelstudio_Change_prediction_upload_algorithm Nov 2, 2023
@rostyslavhereha rostyslavhereha merged commit ecf41d8 into develop Nov 15, 2023
14 of 28 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant