📝 [Docs] - Guides to use Spark Job #44

Taekyoon · 2024-03-29T02:47:41Z

dataverse version checks

I have checked that the issue still exists on the latest versions of the dataverse.

Location of the documentation

Setting Configuration

Documentation problem

When developers use Spark Job, executor, and driver setting is very important.
Depending on how many executors are used and how much memory is consumed, costs and execution time will be different.
Especially for deduplication, number of executors and memory consumption is really critical to process a huge dataset.

Suggestion

Need to explicitly show how developers control executor resources, and how much cost will be used as a default setting.

41ow1ives · 2024-04-03T01:54:26Z

Hello @Taekyoon! I apologize for the delayed response. You've made an excellent point about the importance of Spark job configuration. I agree that it is crucial to provide clear guidance on resource management and cost implications based on default settings. We will strive to offer more detailed guidelines on this matter.
Although we aim to update the documentation by mid-April, please be aware that there might be a slight delay 😅. However, we will do our best to expedite the process. Thank you for your valuable input! We look forward to your continued interest and advice on dataverse. Have a great day.

Taekyoon · 2024-04-03T07:02:02Z

I recommend to deliver this setting to be separated in two ways, one is for cloud and another is for local.
Developers can be confused with settings in different environments.
Plus, when the developers are using on local envs. Their envs might be various, so it needs to describe local test envs.
If these contents are included in the docs, that would be easier to use to them :)

41ow1ives added the Docs Improvements or additions to documentation label Apr 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📝 [Docs] - Guides to use Spark Job #44

📝 [Docs] - Guides to use Spark Job #44

Taekyoon commented Mar 29, 2024 •

edited by Kimyungi

Loading

41ow1ives commented Apr 3, 2024

Taekyoon commented Apr 3, 2024

📝 [Docs] - Guides to use Spark Job #44

📝 [Docs] - Guides to use Spark Job #44

Comments

Taekyoon commented Mar 29, 2024 • edited by Kimyungi Loading

dataverse version checks

Location of the documentation

Documentation problem

Suggestion

41ow1ives commented Apr 3, 2024

Taekyoon commented Apr 3, 2024

Taekyoon commented Mar 29, 2024 •

edited by Kimyungi

Loading