-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Paragraphs" -> Lines ? #10
Comments
p.s. I hope that didn't sound cranky/complainy! I don't know the history of this; there are probably good reasons for "paragraphs" terminology. And it makes sense than an editor object would supply line numbers. I'm just looking at indenters and lines and asking if that could be supplied more simply/directly. |
Mainly just commenting on the one bullet:
I agree that the paragraphs/lines terminology in I would suggest that, if we move away from the paragraph terminology, we pick a word that isn't "line" so as to avoid confusion with text%. The rest of this sounds really great! |
That's a great point about "paragraphs", and soft/visual line breaks vs. hard line breaks. In the specific case of an identer, AFACIT it's only concerned with hard line breaks; "lines" in the sense of e.g. Would it help if the method names were something like |
To be clear I don't really care if the new methods have "paragraph" in the name. So if you prefer (say) The main point for me is the method signature |
It seems to me that “paragraph” means “visually-separated-block” (of hard and soft lines). A long name (not for @rfindler) but perhaps more intuitive than paragraph.
|
What I propose (though not necessarily the names) is really just this:
I'm not sure how to "port" this to an implementation for... I'm not even sure which class it is, |
I think the change would be at For names, I would go with I can make these changes, but I'd prefer to wait until later in the day so we can first deal with any fallout from the changes so far (as discovered by DrDr, for example). |
Any name is good for me, subject to the constraints discussed. I agree that
the name isn't the main point and I also agree there appears to be
performance benefits to the revised API (not when you are already in a
text% object but c'est la vie).
Robby
|
I started looking at this, and now I'm not so sure. The existing navigation and indentation code could work in terms of line starts, but it's messy in at least once place (where the shrubbery indenter really does want to iterate over lines), and I worry about breaking things in other places. I think it would be a similar amount of work and tidier overall to help make |
Thank you for looking at this. I had planned to look at it myself, expecting it would be straightforward for the racket and at-exp indenters. But if I had gotten as far as the shrubbery indenter it sounds like I would have hit that thicket (pun kind of intended) you mention, and if you find it messy, I'd have been really challenged. That sounds discouraging. I guess my intuition might be unduly influenced by things like text editors, where, AFAIK, people creating indenters historically don't want to rely on line numbers because it could be too slow. So, for example, if someone wanted to create an indenter for a shrubbery lang for (say) neovim, and couldn't use Even if that's correct, I don't know if that kind of scenario changes your opinion at all. Assuming not: I can look at implementing line numbers with an efficient way to handle the case where I stop re-tokenizing because the new tokens have converged to be the old tokens merely shifted in position. I had started to do that, before pausing to ask "hey is this even really necessary", but I can resume that. |
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
If I understand, the "like Overall, since at least one indenter needs line numbers, and since it's a general concept that seems likely useful elsewhere, I'm still in favor of having "like It's surprising to me that a position<->line number lookup is not generally available in editors, so I can understand where you're starting from, but it doesn't seem like that has to drive our decision here. |
Maybe ;; A 5,000 line file, about 300,000 chars long
(define content
(for/fold ([s ""])
([n (in-range 5000)])
(string-append s "\nafd;lkajsdf;lkajsfd;lkajsdf;lakjdsfl;kajsdfadsfasdfasdfasfd")))
(define (position-paragraph pos)
(define len (string-length content))
(let loop ([ix 0] [num 0])
(cond [(or (= ix pos)
(= ix len))
num]
[(char=? (string-ref content ix) #\newline)
(loop (add1 ix) (add1 num))]
[else
(loop (add1 ix) num)])))
(time (position-paragraph 4000))
;; cpu time: 0 real time: 0 gc time: 0
;; 67
(time (position-paragraph 30000))
;; cpu time: 0 real time: 0 gc time: 0
;; 500
(time (position-paragraph (sub1 (string-length content))))
;; cpu time: 2 real time: 2 gc time: 0
;; 5000 The second example is a style guide size file and the third is something like And then |
Just because I was curious, I adapted the code above to use
|
FWIW, here's a text data structure that has similar performance characteristics to |
@rfindler Good to know. @mflatt Thank you for that data structure. I didn't mean to be non-responsive to your offer to help like that. At the moment I'm cycling around various aspects of this, trying to herd the cats so none gets left too far behind. I also have some fussy things on the Emacs integration side to resolve. And I had a concurrency bug with updates from Emacs arriving at the back end out of generation order (much like TCP packets), that I needed to fix. I'll look at Position Paragraph Cat again, soon. Thanks! |
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
To report back:
TL;DR the best available pieces were in Most recently I've shifted back to the Emacs side to test and dog-food. Assuming things settle down I'll look again at what, if anything, I really have to contribute back to |
Thank you @greghendershott . Sounds like Scott Owens (the original author of that code, and current researcher in the UK working on a verified Cake ML among other things) knew what he was doing! If there is a way in which we (Emacs mode and DrRacket) could share code, I would be into that. In addition to just the natural benefits of reduced maintenance burden, I would sleep better knowing that Emacs and DrRacket are coloring the same things the same ways! |
I think he did an excellent job on the optimization! IIRC a year ago, I think my eyeballs snagged on the documentation:
plus since I was mentally primed to think of this in terms of intervals, I detoured. ✔️ re sharing code. |
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
To report back again: Things seem to be stabilizing for me around this sort of There's a little bit of bridge/shim code for how it gets used in Emacs, here, which may help give some idea how small the actual "public surface area" is. What I'm not sure about, from another glance at expeditor-lib, is how useful any of this would be for that. For example is my design too infected with ideas about concurrency that would just be an annoying PITA for expeditor? Am I solving problem it does not have, and not solving problems it does have? I'm more confident what I have is in the right ballpark as the "back end" to be used by a separate, possibly even remote "front end" program like Emacs or Vim (or maybe even LSP racket-langserver?). Maybe worth contributing to syntax-color-lib on that basis. But I'm not sure. And if no one else would actually use it from syntax-color-lib, my life would be simpler if it continued to live in the Racket Mode source -- where I could continue to refine without breaking other things. So I don't have an agenda either way -- and certainly I won't feel delighted/insulted if other people do/don't want to make use of it. I'm also open to discussing how it could change to be more useful for other people. For now I just wanted to touch base, share what I've done, and see if anyone has any thoughts. Not urgent from my POV. |
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
Hi @greghendershott — I suspect your code would be useful in the long run in Meanwhile, I've been updating the shrubbery indenter to work right in DrRacket's interactions area, and I've discovered that I need the So, does that change to |
If we're not getting too far afield from the original topic here, one thought I had was whether or not adding some part of racket-mode to the main distribution would make sense? Would that make it easier to track multiple versions, since you could reliably count on the racket code to be code that comes with that latest version? |
@mflatt Adding a (The |
@greghendershott Thanks! I've pushed that change. |
Somewhat OT: I'm still early in my experience using the hash-lang stuff with REPL a.k.a. interactions buffers (as opposed to edit buffers). I thought about using multiple regions as does the framework A possible riff on "the top-level is hopeless" is "top-level output is a hopeless mix" --- of REPL prompts, user program printing-module-begin, and user program output (which itself can be a mix of So I'm kind of ruminating on ways to keep these "streams" delineated so they can be treated differently (including but not limited to color-lexing some vs. not). I don't have any great ideas to report yet. I've sort of set that aside to think about at random intervals, while I mainly work on other things for awhile. |
@rfindler Currently Racket Mode supports Racket versions as old as 6.9. That could probably be bumped a bit newer. Moving some code into the main distribution as of (say) "8.5" wouldn't really help until N years from now when I drop support for pre-"8.5". Meanwhile I'd still need to "polyfill" (ship a use-this-if-not-found copy of the code) for pre-8.5. And if I fixed/added something for (say) "9.0", it would still be broken/missing before then, so I'd still need to "polyfill", so... I'm not sure how it would help much, ever? I think it's just a fundamentally different release concept than for Dr Racket which is associated with a release of Racket, and the older releases of Racket+DrRacket are available to get. |
JFYA, I took the same approach in adapting the shrubbery indenter (on the client side, in that case). It uses a wrapper around |
DrRacket is not doing things a lot more complicated, if I'm understanding correctly. Specifically, it makes the regions that aren't after the latest prompt uneditable. So it colors those (once) and then can count on just leaving the colors. It also ensures that any IO that might show up is always put before the prompt. So the region that's actively being colored might move forward, but it will always be at some point and then further down to the end from there. In short, it can control where changes to the buffer happen so it does, in order to make this simpler. |
I agree it's not a lot more complicated. And definitely I didn't mean a criticism that |
Oh, no problem! I somehow got the impression that you guys thought it was more complicated that it was. I was trying to say that it isn't really using the full power of that method and trying to explain more what it was exactly doing. Sorry for the confusion. Thanks, @greghendershott . |
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
From racket/expeditor#10 it sounds like the "traditional" approach isn't desirable after all; remove it. Make the paragraph numbering "moderately" optimized: On do-udpate! we invalidate it, and then the paragraph methods recalculate it on-demand. This is a mid point on design spectrum between "naively scan from 0 every single time" and "do some minimal rebuild on every update". Note that I'm not 100% confident about the concurrency safety. But I'm committing anyway, for now, because I plan to take hard look at concurrency for the class, soon, including this as well as the potential for update generations arriving on command threads out of order and needing to be queued something like TCP packets.
Although this is maybe the wrong repo, hopefully that's OK because this is more of a quick question? Depending on answer I can close or move to
drracket
.Could we:
beginning-of-line
andend-of-line
that areposition -> position
.drracket:indentation
/drracket:range-indentation
functions should use those.?
Why I ask:
The "paragraphs" terminology has always been confusing to me. AFAICT in practice this actually means "lines".
position-paragraph
is a performance bottleneck. Effectively it's "give me a line number" -- which is inherently slow -- but AFAICT indenters only need it to give toparagraph-{start end}-position
.If indenters have a position, they could give that to a
{beginning end}-of-line
which could efficiently just look backward/forward for a newline. No need to count "paragraph" numbers from the beginning to N, no need to maintain and smartly update some auxiliary data structure to make that less horribly slow.Some existing indenters might compare "paragraph numbers", but AFAICT they're really asking "are these two positions within the same line". Instead you could answer that comparing
beginning-of-line
for two positions.I'm thinking of this because in my "like
like-text%
" so far I've punted on attempting an update of "paragraphs" to be as smart/fast as that for the tokens. Although "dread" is too strong, I have a hunch this will be non-trivial -- probably a lot of effort/bugs vs. giving indenters line methods that directly answer what they want to ask?The text was updated successfully, but these errors were encountered: