-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can we do multiprocessing or distributed computing through PyFlink? #21
Comments
@abhalawat It's processing the data in a distributed way. This is the ability of Flink runtime. |
How will I achieve that?Will PyFlink help me do that? |
Yes, when you submit a PyFlink to a remote cluster, e.g. YARN, K8s, etc, it will execute it in a distributed manner (you need to set the parallelism to a value other than 1). See https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/cli/#submitting-pyflink-jobs for more details on how to submit PyFlink jobs to a remote cluster. |
Could you post the full exception stack? |
@abhalawat From the exception stack, it failed to create the output directory. There should be some permission issues. You could change the sink connector to other connectors such as |
Should I change this command:
|
Also,In here connector has to be file system because I am attaching file in SQL query. |
I am trying to get about 14million data and want this process to work faster.Is there any way PyFlink could help?
The text was updated successfully, but these errors were encountered: