tokenize on sentence, not all document #28

ngawangtrinley · 2020-04-27T06:33:38Z

Description

Tokenizing is done on all the text each time. This makes the editor very slow. The waiting time grows exponentially. The maximum typing latency should be around ##, now it's much bigger.

The tokenizer is triggered by the syllable marker '་', the phrase marker '།'. and return char. At the moment the tokenizer processes all the document in one go. It should be tokenizing one phrase at a time. A phrase is a string of char separated by a '།'.

How to reproduce

Paste the text from here and try to edit
paste the text sentence by sentence and check the waiting time

Proposed solution

find a way to tokenize and highlight in the background (is it even possible with python?)
find phrase span, resegment the phrase at each trigger char
get n syllable context around the cursor, segment the context at each trigger

ngawangtrinley added the enhancement label Apr 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tokenize on sentence, not all document #28

tokenize on sentence, not all document #28

ngawangtrinley commented Apr 27, 2020 •

edited

Loading

tokenize on sentence, not all document #28

tokenize on sentence, not all document #28

Comments

ngawangtrinley commented Apr 27, 2020 • edited Loading

Description

How to reproduce

Proposed solution

ngawangtrinley commented Apr 27, 2020 •

edited

Loading