I'm not extremely familiar with the details of incremental parsing, but I have used Cursorless, a VSCode extension based on tree-sitter for voice controlled structured editing, and it is pretty powerful. You can use the structured editing when you want and also normal editing in between. Occasionally the parser will get things wrong and only change/take/select part of a function or what have you, but in general it's very useful, and I tend to miss it now that I am no longer voice coding much. I seem to remember that there was a similar extension for emacs (sans voice control). treemacs, or something? Anyone used that?
I think simpler is better when it comes to structured editing. Recursive teXt has the advantage that it proposes a simple block structure built into the text itself [1]. Of course, you will need to design your language to take advantage of this block structure.
No doubt, brackets of course also convey structure. But I think indentation
is better for visualising block structure. Inside these blocks, you can still use brackets, and errors like missing opening or closing brackets will not spill over into other blocks.
I wrote a racket function that would need 35 levels of indentation ten minutes ago. White space isn't coming for lisps until we figure out 12 dimensional displays.
Assuming this number of indentation is really necessary (which I doubt; maybe a few auxiliary definitions are in order?), obviously only the first few levels would be represented as their own Recursive teXt blocks.
Recursive teXt (RX) would be a great fit for Lisp, although I am more interested in replacing Lisp entirely with a simpler language rooted in abstraction logic.
Note that RX is not like normal semantic white space, but simpler. It is hard coded into the text without taking its content into consideration. RX is basically nested records of text, making it very robust, but encoded as plain text instead of JSON or something like that.
An everyday example is the difference between JSON.stringify(data, null, 2) and a pretty printer like ipython or Deno.inspect. Having only one node per line reduces the amount of data that can be displayed. It's the same with code.
Trying to use any kind of syntax highlighter with TeX is a pain in the butt. I didn't mean LaTeX there. I mean TeX, which can rewrite it's own lexer, and a lot of libraries work by doing so. I move in and out of TeXInfo syntax and it basically just causes most editors to sit there screaming that everything is broken.
Yes its pretty funny when you realise what a tiny corner of the design space of programs most users inhabit that they think things like lsp are an amazing tool instead of a weekend throwaway project.
What's even funnier is how much they attack anyone who points this out.
perhaps the "attacks" relate to the condescending tone with which you relate your superior skills.
I think most people's amazement with lsp relates to the practical benefits of such a project _not_ being thrown away but taken that last 10% (which is 90% of the work) to make it suitable for so many use cases and selling people on the idea of doing so.
What's amazing about lsp isn't the polish, it's that we've hobbled our selves so much that a tool like it is even useful.
Only having exposure to the algol family of languages does for your mental capabilities what a sugar only diet does for your physical capabilities. It used to be the case that all programmers had exposure to assembly/machine code which broke them out of the worst habits algols instill. No longer.
Pointing out that the majority of programmers today have the mental equivalent of scurvy is somehow condescending but the corp selling false teeth along with their sugar buckets is somehow commendable.
Knowing non-algol languages won't make editor actions any less useful for algol-like. If anything, it'll just make you pretend that you don't need such and as such will end up less productive than you could be.
And editor actions can be useful for any language which either allow you to edit things, or has more than one way to do the same thing (among a bunch of other things), which includes basically everything. Of course editor functionality isn't a thing that'd be 100% beneficial 100% of the time, but it's plenty above 0% if you don't purposefully ignore it.
> it is now clear to me that there is ongoing work on structured editing which either doesn’t know about incremental parsing in general, or Tim’s algorithms specifically. I hope this post serves as a useful advert to such folk
I'm curious about this unnamed ongoing work (that is unaware of incremental parsing).
I don't know specifically - but even now, i still end up having to explain to people that incremental parsing/lexing (particularly without error recovery) is not hard, it is not really complicated, and as the author here said, Tim (et al) have made beautiful algorithms that make this stuff easy.
Heck, incremental lexing is even easy to explain.
For each token, track where the lexer actually looked in the input stream to make decisions. Any time that part of the input stream changes, every token to actually look at the changed portion of the input stream is re-lexed, and if the result changes, keep re-lexing until the before/after tokenstreams sync up again or you run out of input. That's it.
You can also make a dumber version that statically calculates the maximum lookahead (lookbehind if you support that too) of the entire grammer, or the maximum possible lookahead per token, and uses that instead of tracking the actual lookahead used.
In practice, this is often harder than just tracking the actual lookahead used.
In an LL system like ANTLR, incremental parsing is very similar - since it generates top-down parsers, it's the same basic theory - track what token ranges were looked at as you parse.
During incremental update, only descend into portions of the parse tree where the token ranges looked at contain modified tokens.
Bottom up is trickier.
Error recovery is the meaningfully tricky part in all of this.
Before tree-sitter, I was constantly explaining this stuff to people (I followed the projects that these algorithms came out of - ENSEMBLE, HARMONIA, etc).
After more people get that there are ways of doing this, but you still run into people who are re-creating things we solved in pretty great ways many years ago.
I'm not extremely familiar with the details of incremental parsing, but I have used Cursorless, a VSCode extension based on tree-sitter for voice controlled structured editing, and it is pretty powerful. You can use the structured editing when you want and also normal editing in between. Occasionally the parser will get things wrong and only change/take/select part of a function or what have you, but in general it's very useful, and I tend to miss it now that I am no longer voice coding much. I seem to remember that there was a similar extension for emacs (sans voice control). treemacs, or something? Anyone used that?
[0] https://www.cursorless.org/
Does anything similar exist for JetBrains IDEs, but fully open source? (Open source plugin, and open source voice recognition model running locally.)
I think simpler is better when it comes to structured editing. Recursive teXt has the advantage that it proposes a simple block structure built into the text itself [1]. Of course, you will need to design your language to take advantage of this block structure.
[1] http://recursivetext.com
Since Lisp has been around since 1960... Congratulations, you're only about 64 years late.
No doubt, brackets of course also convey structure. But I think indentation is better for visualising block structure. Inside these blocks, you can still use brackets, and errors like missing opening or closing brackets will not spill over into other blocks.
And yeah, I am definitely coming for Lisp.
I wrote a racket function that would need 35 levels of indentation ten minutes ago. White space isn't coming for lisps until we figure out 12 dimensional displays.
Assuming this number of indentation is really necessary (which I doubt; maybe a few auxiliary definitions are in order?), obviously only the first few levels would be represented as their own Recursive teXt blocks.
Is a function that would need 35 levels of indentation a good idea? I have seen C code with about 12 levels of indentation and that was not too great.
What other languages use syntax for lisps use function applications for.
Viz. the array reference a[0] in algols is a function application in lisps (vector-ref a 0).
The same is true for everything else. Semantic white space in such a language is a terrible idea as everyone eventually finds out.
Recursive teXt (RX) would be a great fit for Lisp, although I am more interested in replacing Lisp entirely with a simpler language rooted in abstraction logic.
Note that RX is not like normal semantic white space, but simpler. It is hard coded into the text without taking its content into consideration. RX is basically nested records of text, making it very robust, but encoded as plain text instead of JSON or something like that.
No it won't be. S-expressions are the simplest way to linearize a tree.
Everyone thinks there's something better and the very motivated write an interpreter for their ideal language in lisp.
The ideas inevitably have so much jank when used in anger that you always come back to sexp.
Now if you discover a way to linearize dags or arbitrary graphs without having to keep a table of symbols I'd love to hear it.
An everyday example is the difference between JSON.stringify(data, null, 2) and a pretty printer like ipython or Deno.inspect. Having only one node per line reduces the amount of data that can be displayed. It's the same with code.
Trying to use any kind of syntax highlighter with TeX is a pain in the butt. I didn't mean LaTeX there. I mean TeX, which can rewrite it's own lexer, and a lot of libraries work by doing so. I move in and out of TeXInfo syntax and it basically just causes most editors to sit there screaming that everything is broken.
Yes its pretty funny when you realise what a tiny corner of the design space of programs most users inhabit that they think things like lsp are an amazing tool instead of a weekend throwaway project.
What's even funnier is how much they attack anyone who points this out.
perhaps the "attacks" relate to the condescending tone with which you relate your superior skills.
I think most people's amazement with lsp relates to the practical benefits of such a project _not_ being thrown away but taken that last 10% (which is 90% of the work) to make it suitable for so many use cases and selling people on the idea of doing so.
What's amazing about lsp isn't the polish, it's that we've hobbled our selves so much that a tool like it is even useful.
Only having exposure to the algol family of languages does for your mental capabilities what a sugar only diet does for your physical capabilities. It used to be the case that all programmers had exposure to assembly/machine code which broke them out of the worst habits algols instill. No longer.
Pointing out that the majority of programmers today have the mental equivalent of scurvy is somehow condescending but the corp selling false teeth along with their sugar buckets is somehow commendable.
Knowing non-algol languages won't make editor actions any less useful for algol-like. If anything, it'll just make you pretend that you don't need such and as such will end up less productive than you could be.
And editor actions can be useful for any language which either allow you to edit things, or has more than one way to do the same thing (among a bunch of other things), which includes basically everything. Of course editor functionality isn't a thing that'd be 100% beneficial 100% of the time, but it's plenty above 0% if you don't purposefully ignore it.
> it is now clear to me that there is ongoing work on structured editing which either doesn’t know about incremental parsing in general, or Tim’s algorithms specifically. I hope this post serves as a useful advert to such folk
I'm curious about this unnamed ongoing work (that is unaware of incremental parsing).
Anyone know what he is referring to?
I don't know specifically - but even now, i still end up having to explain to people that incremental parsing/lexing (particularly without error recovery) is not hard, it is not really complicated, and as the author here said, Tim (et al) have made beautiful algorithms that make this stuff easy.
Heck, incremental lexing is even easy to explain. For each token, track where the lexer actually looked in the input stream to make decisions. Any time that part of the input stream changes, every token to actually look at the changed portion of the input stream is re-lexed, and if the result changes, keep re-lexing until the before/after tokenstreams sync up again or you run out of input. That's it.
You can also make a dumber version that statically calculates the maximum lookahead (lookbehind if you support that too) of the entire grammer, or the maximum possible lookahead per token, and uses that instead of tracking the actual lookahead used. In practice, this is often harder than just tracking the actual lookahead used.
In an LL system like ANTLR, incremental parsing is very similar - since it generates top-down parsers, it's the same basic theory - track what token ranges were looked at as you parse. During incremental update, only descend into portions of the parse tree where the token ranges looked at contain modified tokens.
Bottom up is trickier. Error recovery is the meaningfully tricky part in all of this.
Before tree-sitter, I was constantly explaining this stuff to people (I followed the projects that these algorithms came out of - ENSEMBLE, HARMONIA, etc). After more people get that there are ways of doing this, but you still run into people who are re-creating things we solved in pretty great ways many years ago.