You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm trying to convert a dataset from English into IPA using huggingface's datasets package, but when I ask it to use more then 4 processes it crashes saying:
Traceback (most recent call last):
File "/users/PAS2836/ajencks/workspace/epitran-transcription/./main.py", line 33, in <module>
tokenized = dataset.map(
File "/users/PAS2836/ajencks/.conda/envs/epitran/lib/python3.10/site-packages/datasets/dataset_dict.py", line 886, in map
{
File "/users/PAS2836/ajencks/.conda/envs/epitran/lib/python3.10/site-packages/datasets/dataset_dict.py", line 887, in <dictcomp>
k: dataset.map(
File "/users/PAS2836/ajencks/.conda/envs/epitran/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 560, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/users/PAS2836/ajencks/.conda/envs/epitran/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 3147, in map
for rank, done, content in iflatmap_unordered(
File "/users/PAS2836/ajencks/.conda/envs/epitran/lib/python3.10/site-packages/datasets/utils/py_utils.py", line 711, in iflatmap_unordered
raise RuntimeError(
RuntimeError: One of the subprocesses has abruptly died during map operation.To debug the error, disable multiprocessing.
I've tried other packages and I haven't had this problem, the code looks something like this:
Describe the bug
I'm trying to convert a dataset from English into IPA using huggingface's datasets package, but when I ask it to use more then 4 processes it crashes saying:
I've tried other packages and I haven't had this problem, the code looks something like this:
Phonemizer version
The output of
phonemize --version
from command line, very helpfull!System
Your OS (Linux distribution, Windows, ...), eventually Python version.
To reproduce
A short example (Python script or command) reproducing the bug.
See the script I've supplied above, requires packages:
datasets phonemizer
Expected behavior
A clear and concise description of what you expected to happen.
The program runs and converts the desired dataset into ipa then saves it to disk, using parallelization to speed up transcription.
Additional context
Add any other context about the problem here.
The error only occurs on the slurm job environment, when I run the code on my own machine, it works just fine.
The text was updated successfully, but these errors were encountered: