Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow to translate dataset text on chart #648

Merged
merged 16 commits into from
Mar 13, 2024
Merged

Conversation

Ian2012
Copy link
Contributor

@Ian2012 Ian2012 commented Mar 11, 2024

Description

This PR allows the translation of text on datasets such as fixed strings by:

  • Implementing a jinja filter that converts fixed text (e.g 'audit') to its translation. It uses a CASE clause in the following format:
CASE
   {{column_name}} = {{fixed_string}} THEN {{translation}}
   ...
   ELSE {{column_name}}
END

The jinja filter can be used like this in the Chart dimensions: {% raw %}{{translate_column('enrollment_mode')}}{% endraw %}. You can do it visually and export it (asset serialization will try to keep the proper format):
image

This is an example of an Spanish translation working.
image

This PR also updates some metrics names to match what they mean.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Mar 11, 2024
@openedx-webhooks
Copy link

Thanks for the pull request, @Ian2012! Please note that it may take us up to several weeks or months to complete a review and merge your PR.

Feel free to add as much of the following information to the ticket as you can:

  • supporting documentation
  • Open edX discussion forum threads
  • timeline information ("this must be merged by XX date", and why that is)
  • partner information ("this is a course on edx.org")
  • any other information that can help Product understand the context for the PR

All technical communication about the code itself will be done via the GitHub pull request interface. As a reminder, our process documentation is here.

Please let us know once your PR is ready for our review and all tests are green.

@bmtcril bmtcril requested review from SoryRawyer and bmtcril March 11, 2024 17:41
@Ian2012 Ian2012 force-pushed the cag/dataset-translation branch from 4284120 to 859c901 Compare March 11, 2024 17:53
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just some questions and thoughts on performance

case_format = """CASE \n {cases} \n ELSE {column_name} \n END"""
single_case_format = "WHEN {column_name} = '{string}' THEN '{translation}'"
cases = "\n".join(
single_case_format.format(column_name=column_name, string=string, translation=get_translation(string, lang))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When get_translation is called, localization.py will load all of the translations for all languages into memory. I'm not sure how long that will stick around for, but loading a bunch of files off disk repeatedly might cause performance issues. I doubt we'll have so many strings that it will make much of a dent in server memory, but we might want to compare before and after this change to see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, it's using: 347.1MiB. Before it was: 307Mi. We could load the file only when needed but the disk operation is slower. The file content will be permanently in memory when the first request ask for translations

Base automatically changed from cag/translations to main March 11, 2024 19:44
@Ian2012 Ian2012 force-pushed the cag/dataset-translation branch from fe71f9c to 9541153 Compare March 11, 2024 20:43
@@ -47,7 +47,7 @@ elif [[ "${1}" == "beat" ]]; then
celery --app=superset.tasks.celery_app:app beat --pidfile /tmp/celerybeat.pid -l INFO -s "${SUPERSET_HOME}"/celerybeat-schedule
elif [[ "${1}" == "app" ]]; then
echo "Starting web app..."
flask run -p 8088 --with-threads --reload --debugger --host=0.0.0.0
flask run -p 8088 --with-threads --reload --debugger --debug --host=0.0.0.0
Copy link
Contributor

@SoryRawyer SoryRawyer Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the --debug flag meant to stay in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this only will affect dev environment, local en k8s works the same as before.

@SoryRawyer
Copy link
Contributor

I'm assuming the changes to the chart and dataset configurations are due to re-exporting charts that now have translated column names. Is that right?

@Ian2012
Copy link
Contributor Author

Ian2012 commented Mar 12, 2024

@SoryRawyer Yes, also some metrics were moved from the charts to the dataset and assigned a proper name to be translated.

@Ian2012 Ian2012 requested review from bmtcril and SoryRawyer March 12, 2024 21:23
Copy link
Contributor

@bmtcril bmtcril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So excited to see this problem solved!

@Ian2012 Ian2012 merged commit 59ce454 into main Mar 13, 2024
10 checks passed
@Ian2012 Ian2012 deleted the cag/dataset-translation branch March 13, 2024 15:36
@openedx-webhooks
Copy link

@Ian2012 🎉 Your pull request was merged! Please take a moment to answer a two question survey so we can improve your experience in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
open-source-contribution PR author is not from Axim or 2U
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants