Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError when writing SRT from TTML input #244

Closed
davemevans opened this issue Mar 29, 2021 · 6 comments
Closed

RecursionError when writing SRT from TTML input #244

davemevans opened this issue Mar 29, 2021 · 6 comments
Assignees
Labels
bug Something isn't working c-isd

Comments

@davemevans
Copy link

I've been testing ttconv and have come across a strange issue when reading TTML and writing to SRT. When I write as TTML, everything works fine.

The input file is around eighty minutes in duration and has around 13000 <p> elements. Cutting this file to eg 100 allows the process to complete with no issue.

$ python src/main/python/ttconv/tt.py convert -i /mnt/c/work/tmp/test.ttml -o test.srt
Input file is /mnt/c/work/tmp/test.ttml
Output file is test.srt
Reading: |██████████████████████████████████████████████████| 100% Complete
Writing: |██------------------------------------------------|   5% CompleteTraceback (most recent call last):
  File "src/main/python/ttconv/tt.py", line 382, in <module>
    main(sys.argv[1:])
  File "src/main/python/ttconv/tt.py", line 378, in main
    args.func(args)
  File "src/main/python/ttconv/tt.py", line 346, in convert
    srt_document = srt_writer.from_model(model, writer_config, progress_callback_write)
  File "/home/user/ttconv/src/main/python/ttconv/srt/writer.py", line 189, in from_model
    isds = ISD.generate_isd_sequence(doc, _isd_progress, is_multithreaded=isd_config.multi_thread if isd_config is not None else True)
  File "/home/user/ttconv/src/main/python/ttconv/isd.py", line 292, in generate_isd_sequence
    int(len(sig_times)/multiprocessing.cpu_count()))
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/pool.py", line 325, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/pool.py", line 748, in next
    raise value
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object
@palemieux
Copy link
Contributor

@davemevans Can you try with multithreading disabled?

{ "isd" : { "multi_thread": false } }

... or even better, can you share the test file privately?

@palemieux
Copy link
Contributor

palemieux commented Mar 29, 2021

I replicated the error by setting doc_size = 13000 in the following test:

It looks like this a consequence of using multithreading, which in turns requires the pickle module, which requires sys.setrecursionlimit(limit) to be set higher since the ISD trees are self-referential, e.g. each node references the root.

Potential immediate-term solutions:

  • increase sys.setrecursionlimit(limit) (more memory + platform limitations)
  • disable multithreading (slower)

Potential short-term solutions:

  • serialize ISDs to TTML documents before sending to each thread
  • custom pickling of ISDs
  • improve performance to remove the need for multithreading

@palemieux
Copy link
Contributor

palemieux commented Mar 30, 2021

@davemevans See #246 . Performance has been significantly improved for many common use cases. Hopefully yours falls into that category (let me know). This should allow ttconv to be run without multithreading until the pickling issue is fixed.

python src/main/python/ttconv/tt.py convert -i xxx -o xxx --config '{ "isd" : { "multi_thread": false } }'

@palemieux palemieux self-assigned this Mar 31, 2021
@palemieux palemieux added bug Something isn't working c-isd labels Mar 31, 2021
@davemevans
Copy link
Author

Turning multithreading off, and applying #246, allows the test case to pass - thanks!

I haven't tested without #246 - it takes 8m30 to process the doc above on my laptop already 😄

@palemieux palemieux added enhancement New feature or request bug Something isn't working and removed bug Something isn't working enhancement New feature or request labels Apr 1, 2021
@palemieux
Copy link
Contributor

@davemevans Thanks for the review.

@palemieux
Copy link
Contributor

Solved by #248

Additional potential improvements at #250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working c-isd
Projects
None yet
Development

No branches or pull requests

2 participants