Skip to content
This repository has been archived by the owner on Oct 21, 2024. It is now read-only.

📝 [Docs] - Guides to use Spark Job #44

Open
1 task done
Taekyoon opened this issue Mar 29, 2024 · 2 comments
Open
1 task done

📝 [Docs] - Guides to use Spark Job #44

Taekyoon opened this issue Mar 29, 2024 · 2 comments
Labels
Docs Improvements or additions to documentation

Comments

@Taekyoon
Copy link

Taekyoon commented Mar 29, 2024

dataverse version checks

  • I have checked that the issue still exists on the latest versions of the dataverse.

Location of the documentation

Setting Configuration

Documentation problem

When developers use Spark Job, executor, and driver setting is very important.
Depending on how many executors are used and how much memory is consumed, costs and execution time will be different.
Especially for deduplication, number of executors and memory consumption is really critical to process a huge dataset.

Suggestion

Need to explicitly show how developers control executor resources, and how much cost will be used as a default setting.

@41ow1ives
Copy link
Collaborator

Hello @Taekyoon! I apologize for the delayed response. You've made an excellent point about the importance of Spark job configuration. I agree that it is crucial to provide clear guidance on resource management and cost implications based on default settings. We will strive to offer more detailed guidelines on this matter.
Although we aim to update the documentation by mid-April, please be aware that there might be a slight delay 😅. However, we will do our best to expedite the process. Thank you for your valuable input! We look forward to your continued interest and advice on dataverse. Have a great day.

@41ow1ives 41ow1ives added the Docs Improvements or additions to documentation label Apr 3, 2024
@Taekyoon
Copy link
Author

Taekyoon commented Apr 3, 2024

I recommend to deliver this setting to be separated in two ways, one is for cloud and another is for local.
Developers can be confused with settings in different environments.
Plus, when the developers are using on local envs. Their envs might be various, so it needs to describe local test envs.
If these contents are included in the docs, that would be easier to use to them :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Docs Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants