-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while using Stemming #2
Comments
@evangeliazve - thanks for reporting this. I'm not sure how to fix the issue. Can you please send me your exact code and the full error stack trace, so I can try to replicate the issue on my machine? Thanks! |
Hello @MrPowers, thanks for your reply. When I execute the following code everything goes fine : actual_df = df_txts.withColumn("list_of_words_stem", ceja.porter_stem(col("list_of_words"))) However, even though the objet class is dataframe when I use the .show() fonction to show up the result table I obtain the following error message: PythonException Traceback (most recent call last) /databricks/spark/python/pyspark/sql/dataframe.py in show(self, n, truncate, vertical) /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) /databricks/spark/python/pyspark/sql/utils.py in raise_from(e) PythonException: An exception was thrown from a UDF: 'TypeError: str argument expected'. Full traceback below: I converted then to string array format An tried to use it with the following code: TFcv = CountVectorizer(inputCol="list_of_words_stem", outputCol="raw_features", vocabSize=5000, minDF=10.0) IDFidf = IDF(inputCol="raw_features", outputCol="features") And I obtained the following message: Py4JJavaError Traceback (most recent call last) /databricks/spark/python/pyspark/ml/base.py in fit(self, dataset, params) /databricks/spark/python/pyspark/ml/wrapper.py in _fit(self, dataset) /databricks/spark/python/pyspark/ml/wrapper.py in _fit_java(self, dataset) /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) /databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw) /databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) Py4JJavaError: An error occurred while calling o1117.fit. Could you please help me ? Best, |
I have the exact same issue with DataBricks on AWS. I'm trying to use this library inside a UDF. I'll get |
Hello,
I am facing issues when trying to apply stemming on text data in AWS with Pyspark. Here is the error message I'm getting:
PythonException: An exception was thrown from a UDF: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
How can I resolve this?
Thank you for your support.
Best,
Evangelia
The text was updated successfully, but these errors were encountered: