You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 21, 2024. It is now read-only.
I have checked that the issue still exists on the latest versions of the dataverse.
Location of the documentation
Setting Configuration
Documentation problem
When developers use Spark Job, executor, and driver setting is very important.
Depending on how many executors are used and how much memory is consumed, costs and execution time will be different.
Especially for deduplication, number of executors and memory consumption is really critical to process a huge dataset.
Suggestion
Need to explicitly show how developers control executor resources, and how much cost will be used as a default setting.
The text was updated successfully, but these errors were encountered:
Hello @Taekyoon! I apologize for the delayed response. You've made an excellent point about the importance of Spark job configuration. I agree that it is crucial to provide clear guidance on resource management and cost implications based on default settings. We will strive to offer more detailed guidelines on this matter.
Although we aim to update the documentation by mid-April, please be aware that there might be a slight delay 😅. However, we will do our best to expedite the process. Thank you for your valuable input! We look forward to your continued interest and advice on dataverse. Have a great day.
I recommend to deliver this setting to be separated in two ways, one is for cloud and another is for local.
Developers can be confused with settings in different environments.
Plus, when the developers are using on local envs. Their envs might be various, so it needs to describe local test envs.
If these contents are included in the docs, that would be easier to use to them :)
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
dataverse version checks
Location of the documentation
Setting Configuration
Documentation problem
When developers use Spark Job, executor, and driver setting is very important.
Depending on how many executors are used and how much memory is consumed, costs and execution time will be different.
Especially for deduplication, number of executors and memory consumption is really critical to process a huge dataset.
Suggestion
Need to explicitly show how developers control executor resources, and how much cost will be used as a default setting.
The text was updated successfully, but these errors were encountered: