$$\ $$\
$$ | $$ |
$$$$$$\ $$$$$$\ $$$$$$$\ $$$$$$\ $$$$$$$\ $$\ $$\
\_$$ _|\_$$ _| $$ _____|$$ __$$\ $$ __$$\\$$\ $$ |
$$ | $$ | $$ / $$ / $$ |$$ | $$ |\$$\$$ /
$$ |$$\ $$ |$$\ $$ | $$ | $$ |$$ | $$ | \$$$ /
\$$$$ |\$$$$ |\$$$$$$$\ \$$$$$$ |$$ | $$ | \$ /
\____/ \____/ \_______| \______/ \__| \__| \_/
ttconv is a library and command line application written in pure Python for converting between timed text formats used in the presentations of captions, subtitles, karaoke, etc.
ttconv works by mapping the input document, whatever its format, into an internal canonical model, which is then mapped to the format of the output document is derived. The canonical model closely follows the TTML 2 data model, as constrained by the IMSC 1.1 Text Profile specification.
ttconv currently supports the following input and output formats. Additional input and output formats are planned, and suggestions/contributions are welcome.
pip install ttconv
tt.py convert -i <input .scc file> -o <output .ttml file>
tt.py convert [-h] -i INPUT -o OUTPUT [--itype ITYPE] [--otype OTYPE] [--config CONFIG] [--config_file CONFIG_FILE]
-
--itype
:TTML
orSCC
(extrapolated from the filename, if omitted) -
--otype
:TTML
orSRT
(extrapolated from the filename, if omitted) -
--config
and--config_file
: JSON dictionaries with the following members:"general"."progress_bar": "true" | "false"
: whether a progress bar is displayed"general"."log_level": "INFO" | "WARN" | "ERROR"
: logging level"imsc_writer"."time_format": "frames" | "clock_time"
: output TTML expressions in seconds or in frames"imsc_writer"."fps": "<num>/<denom>"
: specifies the frame rate num/denom when output TTML expressions in frames
Example:
tt.py convert -i <.scc file> -o <.ttml file> --itype SCC --otype TTML --config '{"general": {"progress_bar":false, "log_level":"WARN"}}'
The overall architecture of the library is as follows:
- Reader modules validate and convert input files into instances of the canonical model (see
ttconv.imsc.reader.to_model()
for example); - Filter modules transform instances of the canonical data model, e.g. all text styling and positioning might be removed from an instance of the canonical model to match the limited capabilities of downstream devices; and
- Writer modules convert instances of the canonical data model into output files.
Processing shared across multiple reader and writer modules is factored out in common modules whenever possible. For example, several output formats require an instance of the canonical data model to be transformed into a sequence of discrete temporal snapshots – a process called ISD generation.
The library uses the Python logging
module to report non-fatal events.
Unit tests illustrate the use of the library, e.g. ReaderWriterTest.test_imsc_1_test_suite
at
src/test/python/test_imsc_writer.py
.
Detailed documentation including reference documents is under doc
.
The project uses pipenv to manage dependencies.
- run
pipenv install --dev
- set the
PYTHONPATH
environment variable tosrc/main/python
, e.g.export PYTHONPATH=src/main/python
pipenv run
can then be used
docker build --rm -f Dockerfile -t ttconv:latest .
docker run -it --rm ttconv:latest bash
From the root directory of the project:
mkdir build
export PYTHONPATH=src/main/python
python src/main/python/ttconv/tt.py convert -i src/test/resources/scc/mix-rows-roll-up.scc -o build/mix-rows-roll-up.ttml
Unit test code coverage is provided by the script at scripts/coverage.sh
Automated testing is provided by the script at scripts/ci.sh
Run ./scripts/ci.sh
See .github/workflows/main.yml
Run docker run -it --rm ttconv:latest /bin/sh scripts/ci.sh